WorldWideScience

Sample records for cluster analysis results

  1. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  2. Clusters of galaxies as tools in observational cosmology : results from x-ray analysis

    International Nuclear Information System (INIS)

    Weratschnig, J.M.

    2009-01-01

    Clusters of galaxies are the largest gravitationally bound structures in the universe. They can be used as ideal tools to study large scale structure formation (e.g. when studying merger clusters) and provide highly interesting environments to analyse several characteristic interaction processes (like ram pressure stripping of galaxies, magnetic fields). In this dissertation thesis, we have studied several clusters of galaxies using X-ray observations. To obtain scientific results, we have applied different data reduction and analysis methods. With a combination of morphological and spectral analysis, the merger cluster Abell 514 was studied in much detail. It has a highly interesting morphology and shows signs for an ongoing merger as well as a shock. using a new method to detect substructure, we have analysed several clusters to determine whether any substructure is present in the X-ray image. This hints towards a real structure in the distribution of the intra-cluster medium (ICM) and is evidence for ongoing mergers. The results from this analysis are extensively used with the cluster of galaxies Abell S1136. Here, we study the ICM distribution and compare its structure with the spatial distribution of star forming galaxies. Cluster magnetic fields are another important topic of my thesis. They can be studied in Radio observations, which can be put into relation with results from X-ray observations. using observational data from several clusters, we could support the theory that cluster magnetic fields are frozen into the ICM. (author)

  3. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. Ecosystem health pattern analysis of urban clusters based on emergy synthesis: Results and implication for management

    International Nuclear Information System (INIS)

    Su, Meirong; Fath, Brian D.; Yang, Zhifeng; Chen, Bin; Liu, Gengyuan

    2013-01-01

    The evaluation of ecosystem health in urban clusters will help establish effective management that promotes sustainable regional development. To standardize the application of emergy synthesis and set pair analysis (EM–SPA) in ecosystem health assessment, a procedure for using EM–SPA models was established in this paper by combining the ability of emergy synthesis to reflect health status from a biophysical perspective with the ability of set pair analysis to describe extensive relationships among different variables. Based on the EM–SPA model, the relative health levels of selected urban clusters and their related ecosystem health patterns were characterized. The health states of three typical Chinese urban clusters – Jing-Jin-Tang, Yangtze River Delta, and Pearl River Delta – were investigated using the model. The results showed that the health status of the Pearl River Delta was relatively good; the health for the Yangtze River Delta was poor. As for the specific health characteristics, the Pearl River Delta and Yangtze River Delta urban clusters were relatively strong in Vigor, Resilience, and Urban ecosystem service function maintenance, while the Jing-Jin-Tang was relatively strong in organizational structure and environmental impact. Guidelines for managing these different urban clusters were put forward based on the analysis of the results of this study. - Highlights: • The use of integrated emergy synthesis and set pair analysis model was standardized. • The integrated model was applied on the scale of an urban cluster. • Health patterns of different urban clusters were compared. • Policy suggestions were provided based on the health pattern analysis

  5. Cardiometabolic risk clustering in spinal cord injury: results of exploratory factor analysis.

    Science.gov (United States)

    Libin, Alexander; Tinsley, Emily A; Nash, Mark S; Mendez, Armando J; Burns, Patricia; Elrod, Matt; Hamm, Larry F; Groah, Suzanne L

    2013-01-01

    Evidence suggests an elevated prevalence of cardiometabolic risks among persons with spinal cord injury (SCI); however, the unique clustering of risk factors in this population has not been fully explored. The purpose of this study was to describe unique clustering of cardiometabolic risk factors differentiated by level of injury. One hundred twenty-one subjects (mean 37 ± 12 years; range, 18-73) with chronic C5 to T12 motor complete SCI were studied. Assessments included medical histories, anthropometrics and blood pressure, and fasting serum lipids, glucose, insulin, and hemoglobin A1c (HbA1c). The most common cardiometabolic risk factors were overweight/obesity, high levels of low-density lipoprotein (LDL-C), and low levels of high-density lipoprotein (HDL-C). Risk clustering was found in 76.9% of the population. Exploratory principal component factor analysis using varimax rotation revealed a 3-factor model in persons with paraplegia (65.4% variance) and a 4-factor solution in persons with tetraplegia (73.3% variance). The differences between groups were emphasized by the varied composition of the extracted factors: Lipid Profile A (total cholesterol [TC] and LDL-C), Body Mass-Hypertension Profile (body mass index [BMI], systolic blood pressure [SBP], and fasting insulin [FI]); Glycemic Profile (fasting glucose and HbA1c), and Lipid Profile B (TG and HDL-C). BMI and SBP formed a separate factor only in persons with tetraplegia. Although the majority of the population with SCI has risk clustering, the composition of the risk clusters may be dependent on level of injury, based on a factor analysis group comparison. This is clinically plausible and relevant as tetraplegics tend to be hypo- to normotensive and more sedentary, resulting in lower HDL-C and a greater propensity toward impaired carbohydrate metabolism.

  6. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  7. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  8. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  9. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  10. Experimental results of some cluster tests in NSRR

    International Nuclear Information System (INIS)

    Kobayashi, Shinsho; Ohnishi, Nobuaki; Yoshimura, Tomio; Lussie, W.G.

    1978-01-01

    The NSRR programme is in progress in JAERI using a pulsed reactor to evaluate the behavior of reactor fuels under reactivity accident conditions. This report describes briefly the experimental results and preliminary analysis of two cluster tests. In the cluster configuration of five fuel rods, the power distribution in outer fuel rods are not symmetric due to neutron absorption in central fuel rod. The cladding temperature on the exterior boundaries of the cluster is higher than that in interior. Good agreement was obtained between the calculated and measured cladding temperature histories. In the 3.8$ excess reactivity test, cluster averaged energy deposition of 237 cal/g.UO 2 , cladding melting and deformation were limited to the portions of the fuel rods that were on the exterior boundaries of the cluster. (auth.)

  11. Marketing research cluster analysis

    OpenAIRE

    Marić Nebojša

    2002-01-01

    One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  12. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  13. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  14. Grouping and clustering of maize Lancaster germplasm inbreds according to the results of SNP-analysis

    Directory of Open Access Journals (Sweden)

    K. V. Derkach

    2017-08-01

    Full Text Available The objective of this article is the grouping and clustering of maize inbred lines based on the results of SNP-genotyping for the verification of a separate cluster of Lancaster germplasm inbred lines. As material for the study, we used 91 maize (Zea mays L. inbred lines, including 31 Lancaster germplasm lines and 60 inbred lines of other germplasms (23 Iodent inbreds, 15 Reid inbreds, 7 Lacon inbreds, 12 Mix inbreds and 3 exotic inbreds. The majority of the given inbred lines are included in the Dnipro breeding programme. The SNP-genotyping of these inbred lines was conducted using BDI-III panel of 384 SNP-markers developed by BioDiagnostics, Inc. (USA on the base of Illumina VeraCode Bead Plate. The SNP-markers of this panel are biallelic and are located on all 10 maize chromosomes. Their range of conductivity was >0.6. The SNP-analysis was made in completely automated regime on Illumina BeadStation equipment at BioDiagnostics, Inc. (USA. A principal component analysis was applied to group a general set of 91 inbreds according to allelic states of SNP-markers and to identify a cluster of Lancaster inbreds. The clustering and determining hierarchy in 31 Lancaster germplasm inbreds used quantitative cluster analysis. The share of monomorphic markers in the studied set of 91 inbred lines equaled 0.7%, and the share of dimorphic markers equaled 99.3%. Minor allele frequency (MAF > 0.2 was observed for 80.6% of dimorphic markers, the average index of shift of gene diversity equaled 0.2984, PIC on average reached 0.3144. The index of gene diversity of markers varied from 0.1701 to 0.1901, pairwise genetic distances between inbred lines ranged from 0.0316–0.8000, the frequencies of major alleles of SNP-markers were within 0.5085–0.9821, and the frequencies of minor alleles were within 0.0179–0.4915. The average homozygosity of inbred lines was 98.8%. The principal component analysis of SNP-distances confirmed the isolation of the Lancaster

  15. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  16. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  17. Robust cluster analysis and variable selection

    CERN Document Server

    Ritter, Gunter

    2014-01-01

    Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot

  18. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  19. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  20. Bulgarian clusters under development: Political framework and results

    Directory of Open Access Journals (Sweden)

    Bankova Yovka

    2011-01-01

    Full Text Available The idea of clusters is not new but nowadays clusters are in a highlight again. Through cluster policies the countries aim at raising their national competitiveness. The paper deals with two objectives - discussion and evaluation of the strategic framework for clusters in Bulgaria and an analysis of the state of Bulgarian clusters. The paper presents briefly general issues concerning the national competitiveness and clusters as being one of the possible instruments to achieve a sustainable competitiveness. The practice of the policy in the EU in the field of clusters is the basis for conclusions about the role of the governments. The second part deals with the strategic framework for the cluster initiatives in Bulgaria and with a selection of indicators about the SMEs and clusters in the country. On this basis a conclusion about the development stage of Bulgarian clusters is derived.

  1. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  2. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  3. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  4. Taxonomical analysis of the Cancer cluster of galaxies

    International Nuclear Information System (INIS)

    Perea, J.; Olmo, A. del; Moles, M.

    1986-01-01

    A description is presented of the Cancer cluster of galaxies, based on a taxonomical analysis in (α,delta, Vsub(r)) space. Earlier results by previous authors on the lack of dynamical entity of the cluster are confirmed. The present analysis points out the existence of a binary structure in the most populated region of the complex. (author)

  5. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  6. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  7. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  8. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  9. Dynamical processes in space: Cluster results

    Directory of Open Access Journals (Sweden)

    C. P. Escoubet

    2013-06-01

    Full Text Available After 12 years of operations, the Cluster mission continues to successfully fulfil its scientific objectives. The main goal of the Cluster mission, comprised of four identical spacecraft, is to study in three dimensions small-scale plasma structures in key plasma regions of the Earth's environment: solar wind and bow shock, magnetopause, polar cusps, magnetotail, plasmasphere and auroral zone. During the course of the mission, the relative distance between the four spacecraft has been varied from 20 km to 36 000 km to study the scientific regions of interest at different scales. Since summer 2005, new multi-scale constellations have been implemented, wherein three spacecraft (C1, C2, C3 are separated by 10 000 km, while the fourth one (C4 is at a variable distance ranging between 20 km and 10 000 km from C3. Recent observations were conducted in the auroral acceleration region with the spacecraft separated by 1000s km. We present highlights of the results obtained during the last 12 years on collisionless shocks, magnetopause waves, magnetotail dynamics, plasmaspheric structures, and the auroral acceleration region. In addition, we highlight Cluster results on understanding the impact of Coronal Mass Ejections (CME on the Earth environment. We will also present Cluster data accessibility through the Cluster Science Data System (CSDS, and the Cluster Active Archive (CAA, which was implemented to provide a permanent and public archive of high resolution Cluster data from all instruments.

  10. Physicochemical properties of different corn varieties by principal components analysis and cluster analysis

    International Nuclear Information System (INIS)

    Zeng, J.; Li, G.; Sun, J.

    2013-01-01

    Principal components analysis and cluster analysis were used to investigate the properties of different corn varieties. The chemical compositions and some properties of corn flour which processed by drying milling were determined. The results showed that the chemical compositions and physicochemical properties were significantly different among twenty six corn varieties. The quality of corn flour was concerned with five principal components from principal component analysis and the contribution rate of starch pasting properties was important, which could account for 48.90%. Twenty six corn varieties could be classified into four groups by cluster analysis. The consistency between principal components analysis and cluster analysis indicated that multivariate analyses were feasible in the study of corn variety properties. (author)

  11. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  12. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  13. Clustering Trajectories by Relevant Parts for Air Traffic Analysis.

    Science.gov (United States)

    Andrienko, Gennady; Andrienko, Natalia; Fuchs, Georg; Garcia, Jose Manuel Cordero

    2018-01-01

    Clustering of trajectories of moving objects by similarity is an important technique in movement analysis. Existing distance functions assess the similarity between trajectories based on properties of the trajectory points or segments. The properties may include the spatial positions, times, and thematic attributes. There may be a need to focus the analysis on certain parts of trajectories, i.e., points and segments that have particular properties. According to the analysis focus, the analyst may need to cluster trajectories by similarity of their relevant parts only. Throughout the analysis process, the focus may change, and different parts of trajectories may become relevant. We propose an analytical workflow in which interactive filtering tools are used to attach relevance flags to elements of trajectories, clustering is done using a distance function that ignores irrelevant elements, and the resulting clusters are summarized for further analysis. We demonstrate how this workflow can be useful for different analysis tasks in three case studies with real data from the domain of air traffic. We propose a suite of generic techniques and visualization guidelines to support movement data analysis by means of relevance-aware trajectory clustering.

  14. application of single-linkage clustering method in the analysis of ...

    African Journals Online (AJOL)

    Admin

    ANALYSIS OF GROWTH RATE OF GROSS DOMESTIC PRODUCT. (GDP) AT ... The end result of the algorithm is a tree of clusters called a dendrogram, which shows how the clusters are ..... Number of cluster sum from from observations of ...

  15. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  16. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of

  17. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  18. XCluSim: a visual analytics tool for interactively comparing multiple clustering results of bioinformatics data

    Science.gov (United States)

    2015-01-01

    Background Though cluster analysis has become a routine analytic task for bioinformatics research, it is still arduous for researchers to assess the quality of a clustering result. To select the best clustering method and its parameters for a dataset, researchers have to run multiple clustering algorithms and compare them. However, such a comparison task with multiple clustering results is cognitively demanding and laborious. Results In this paper, we present XCluSim, a visual analytics tool that enables users to interactively compare multiple clustering results based on the Visual Information Seeking Mantra. We build a taxonomy for categorizing existing techniques of clustering results visualization in terms of the Gestalt principles of grouping. Using the taxonomy, we choose the most appropriate interactive visualizations for presenting individual clustering results from different types of clustering algorithms. The efficacy of XCluSim is shown through case studies with a bioinformatician. Conclusions Compared to other relevant tools, XCluSim enables users to compare multiple clustering results in a more scalable manner. Moreover, XCluSim supports diverse clustering algorithms and dedicated visualizations and interactions for different types of clustering results, allowing more effective exploration of details on demand. Through case studies with a bioinformatics researcher, we received positive feedback on the functionalities of XCluSim, including its ability to help identify stably clustered items across multiple clustering results. PMID:26328893

  19. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  20. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  1. Multiscale visual quality assessment for cluster analysis with self-organizing maps

    Science.gov (United States)

    Bernard, Jürgen; von Landesberger, Tatiana; Bremm, Sebastian; Schreck, Tobias

    2011-01-01

    Cluster analysis is an important data mining technique for analyzing large amounts of data, reducing many objects to a limited number of clusters. Cluster visualization techniques aim at supporting the user in better understanding the characteristics and relationships among the found clusters. While promising approaches to visual cluster analysis already exist, these usually fall short of incorporating the quality of the obtained clustering results. However, due to the nature of the clustering process, quality plays an important aspect, as for most practical data sets, typically many different clusterings are possible. Being aware of clustering quality is important to judge the expressiveness of a given cluster visualization, or to adjust the clustering process with refined parameters, among others. In this work, we present an encompassing suite of visual tools for quality assessment of an important visual cluster algorithm, namely, the Self-Organizing Map (SOM) technique. We define, measure, and visualize the notion of SOM cluster quality along a hierarchy of cluster abstractions. The quality abstractions range from simple scalar-valued quality scores up to the structural comparison of a given SOM clustering with output of additional supportive clustering methods. The suite of methods allows the user to assess the SOM quality on the appropriate abstraction level, and arrive at improved clustering results. We implement our tools in an integrated system, apply it on experimental data sets, and show its applicability.

  2. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  3. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  4. Network Analysis Tools: from biological networks to clusters and pathways.

    Science.gov (United States)

    Brohée, Sylvain; Faust, Karoline; Lima-Mendez, Gipsi; Vanderstocken, Gilles; van Helden, Jacques

    2008-01-01

    Network Analysis Tools (NeAT) is a suite of computer tools that integrate various algorithms for the analysis of biological networks: comparison between graphs, between clusters, or between graphs and clusters; network randomization; analysis of degree distribution; network-based clustering and path finding. The tools are interconnected to enable a stepwise analysis of the network through a complete analytical workflow. In this protocol, we present a typical case of utilization, where the tasks above are combined to decipher a protein-protein interaction network retrieved from the STRING database. The results returned by NeAT are typically subnetworks, networks enriched with additional information (i.e., clusters or paths) or tables displaying statistics. Typical networks comprising several thousands of nodes and arcs can be analyzed within a few minutes. The complete protocol can be read and executed in approximately 1 h.

  5. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  6. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  7. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  8. Cluster analysis of track structure

    International Nuclear Information System (INIS)

    Michalik, V.

    1991-01-01

    One of the possibilities of classifying track structures is application of conventional partition techniques of analysis of multidimensional data to the track structure. Using these cluster algorithms this paper attempts to find characteristics of radiation reflecting the spatial distribution of ionizations in the primary particle track. An absolute frequency distribution of clusters of ionizations giving the mean number of clusters produced by radiation per unit of deposited energy can serve as this characteristic. General computation techniques used as well as methods of calculations of distributions of clusters for different radiations are discussed. 8 refs.; 5 figs

  9. Interactive K-Means Clustering Method Based on User Behavior for Different Analysis Target in Medicine.

    Science.gov (United States)

    Lei, Yang; Yu, Dai; Bin, Zhang; Yang, Yang

    2017-01-01

    Clustering algorithm as a basis of data analysis is widely used in analysis systems. However, as for the high dimensions of the data, the clustering algorithm may overlook the business relation between these dimensions especially in the medical fields. As a result, usually the clustering result may not meet the business goals of the users. Then, in the clustering process, if it can combine the knowledge of the users, that is, the doctor's knowledge or the analysis intent, the clustering result can be more satisfied. In this paper, we propose an interactive K -means clustering method to improve the user's satisfactions towards the result. The core of this method is to get the user's feedback of the clustering result, to optimize the clustering result. Then, a particle swarm optimization algorithm is used in the method to optimize the parameters, especially the weight settings in the clustering algorithm to make it reflect the user's business preference as possible. After that, based on the parameter optimization and adjustment, the clustering result can be closer to the user's requirement. Finally, we take an example in the breast cancer, to testify our method. The experiments show the better performance of our algorithm.

  10. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    Science.gov (United States)

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  11. Result Diversification Based on Query-Specific Cluster Ranking

    NARCIS (Netherlands)

    J. He (Jiyin); E. Meij; M. de Rijke (Maarten)

    2011-01-01

    htmlabstractResult diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking,

  12. Exact WKB analysis and cluster algebras

    International Nuclear Information System (INIS)

    Iwaki, Kohei; Nakanishi, Tomoki

    2014-01-01

    We develop the mutation theory in the exact WKB analysis using the framework of cluster algebras. Under a continuous deformation of the potential of the Schrödinger equation on a compact Riemann surface, the Stokes graph may change the topology. We call this phenomenon the mutation of Stokes graphs. Along the mutation of Stokes graphs, the Voros symbols, which are monodromy data of the equation, also mutate due to the Stokes phenomenon. We show that the Voros symbols mutate as variables of a cluster algebra with surface realization. As an application, we obtain the identities of Stokes automorphisms associated with periods of cluster algebras. The paper also includes an extensive introduction of the exact WKB analysis and the surface realization of cluster algebras for nonexperts. This article is part of a special issue of Journal of Physics A: Mathematical and Theoretical devoted to ‘Cluster algebras in mathematical physics’. (paper)

  13. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; de Rijke, M.

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  14. Cluster Analysis as an Analytical Tool of Population Policy

    Directory of Open Access Journals (Sweden)

    Oksana Mikhaylovna Shubat

    2017-12-01

    Full Text Available The predicted negative trends in Russian demography (falling birth rates, population decline actualize the need to strengthen measures of family and population policy. Our research purpose is to identify groups of Russian regions with similar characteristics in the family sphere using cluster analysis. The findings should make an important contribution to the field of family policy. We used hierarchical cluster analysis based on the Ward method and the Euclidean distance for segmentation of Russian regions. Clustering is based on four variables, which allowed assessing the family institution in the region. The authors used the data of Federal State Statistics Service from 2010 to 2015. Clustering and profiling of each segment has allowed forming a model of Russian regions depending on the features of the family institution in these regions. The authors revealed four clusters grouping regions with similar problems in the family sphere. This segmentation makes it possible to develop the most relevant family policy measures in each group of regions. Thus, the analysis has shown a high degree of differentiation of the family institution in the regions. This suggests that a unified approach to population problems’ solving is far from being effective. To achieve greater results in the implementation of family policy, a differentiated approach is needed. Methods of multidimensional data classification can be successfully applied as a relevant analytical toolkit. Further research could develop the adaptation of multidimensional classification methods to the analysis of the population problems in Russian regions. In particular, the algorithms of nonparametric cluster analysis may be of relevance in future studies.

  15. Comparative analysis of clustering methods for gene expression time course data

    Directory of Open Access Journals (Sweden)

    Ivan G. Costa

    2004-01-01

    Full Text Available This work performs a data driven comparative study of clustering methods used in the analysis of gene expression time courses (or time series. Five clustering methods found in the literature of gene expression analysis are compared: agglomerative hierarchical clustering, CLICK, dynamical clustering, k-means and self-organizing maps. In order to evaluate the methods, a k-fold cross-validation procedure adapted to unsupervised methods is applied. The accuracy of the results is assessed by the comparison of the partitions obtained in these experiments with gene annotation, such as protein function and series classification.

  16. GLOBULAR CLUSTER ABUNDANCES FROM HIGH-RESOLUTION, INTEGRATED-LIGHT SPECTROSCOPY. II. EXPANDING THE METALLICITY RANGE FOR OLD CLUSTERS AND UPDATED ANALYSIS TECHNIQUES

    Energy Technology Data Exchange (ETDEWEB)

    Colucci, Janet E.; Bernstein, Rebecca A.; McWilliam, Andrew [The Observatories of the Carnegie Institution for Science, 813 Santa Barbara St., Pasadena, CA 91101 (United States)

    2017-01-10

    We present abundances of globular clusters (GCs) in the Milky Way and Fornax from integrated-light (IL) spectra. Our goal is to evaluate the consistency of the IL analysis relative to standard abundance analysis for individual stars in those same clusters. This sample includes an updated analysis of seven clusters from our previous publications and results for five new clusters that expand the metallicity range over which our technique has been tested. We find that the [Fe/H] measured from IL spectra agrees to ∼0.1 dex for GCs with metallicities as high as [Fe/H] = −0.3, but the abundances measured for more metal-rich clusters may be underestimated. In addition we systematically evaluate the accuracy of abundance ratios, [X/Fe], for Na i, Mg i, Al i, Si i, Ca i, Ti i, Ti ii, Sc ii, V i, Cr i, Mn i, Co i, Ni i, Cu i, Y ii, Zr i, Ba ii, La ii, Nd ii, and Eu ii. The elements for which the IL analysis gives results that are most similar to analysis of individual stellar spectra are Fe i, Ca i, Si i, Ni i, and Ba ii. The elements that show the greatest differences include Mg i and Zr i. Some elements show good agreement only over a limited range in metallicity. More stellar abundance data in these clusters would enable more complete evaluation of the IL results for other important elements.

  17. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  18. Genome cluster database. A sequence family analysis platform for Arabidopsis and rice.

    Science.gov (United States)

    Horan, Kevin; Lauricha, Josh; Bailey-Serres, Julia; Raikhel, Natasha; Girke, Thomas

    2005-05-01

    The genome-wide protein sequences from Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) spp. japonica were clustered into families using sequence similarity and domain-based clustering. The two fundamentally different methods resulted in separate cluster sets with complementary properties to compensate the limitations for accurate family analysis. Functional names for the identified families were assigned with an efficient computational approach that uses the description of the most common molecular function gene ontology node within each cluster. Subsequently, multiple alignments and phylogenetic trees were calculated for the assembled families. All clustering results and their underlying sequences were organized in the Web-accessible Genome Cluster Database (http://bioinfo.ucr.edu/projects/GCD) with rich interactive and user-friendly sequence family mining tools to facilitate the analysis of any given family of interest for the plant science community. An automated clustering pipeline ensures current information for future updates in the annotations of the two genomes and clustering improvements. The analysis allowed the first systematic identification of family and singlet proteins present in both organisms as well as those restricted to one of them. In addition, the established Web resources for mining these data provide a road map for future studies of the composition and structure of protein families between the two species.

  19. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    Energy Technology Data Exchange (ETDEWEB)

    Lalonde, Michel, E-mail: mlalonde15@rogers.com; Wassenaar, Richard [Department of Physics, Carleton University, Ottawa, Ontario K1S 5B6 (Canada); Wells, R. Glenn; Birnie, David; Ruddy, Terrence D. [Division of Cardiology, University of Ottawa Heart Institute, Ottawa, Ontario K1Y 4W7 (Canada)

    2014-07-15

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  20. Development and optimization of SPECT gated blood pool cluster analysis for the prediction of CRT outcome

    International Nuclear Information System (INIS)

    Lalonde, Michel; Wassenaar, Richard; Wells, R. Glenn; Birnie, David; Ruddy, Terrence D.

    2014-01-01

    Purpose: Phase analysis of single photon emission computed tomography (SPECT) radionuclide angiography (RNA) has been investigated for its potential to predict the outcome of cardiac resynchronization therapy (CRT). However, phase analysis may be limited in its potential at predicting CRT outcome as valuable information may be lost by assuming that time-activity curves (TAC) follow a simple sinusoidal shape. A new method, cluster analysis, is proposed which directly evaluates the TACs and may lead to a better understanding of dyssynchrony patterns and CRT outcome. Cluster analysis algorithms were developed and optimized to maximize their ability to predict CRT response. Methods: About 49 patients (N = 27 ischemic etiology) received a SPECT RNA scan as well as positron emission tomography (PET) perfusion and viability scans prior to undergoing CRT. A semiautomated algorithm sampled the left ventricle wall to produce 568 TACs from SPECT RNA data. The TACs were then subjected to two different cluster analysis techniques, K-means, and normal average, where several input metrics were also varied to determine the optimal settings for the prediction of CRT outcome. Each TAC was assigned to a cluster group based on the comparison criteria and global and segmental cluster size and scores were used as measures of dyssynchrony and used to predict response to CRT. A repeated random twofold cross-validation technique was used to train and validate the cluster algorithm. Receiver operating characteristic (ROC) analysis was used to calculate the area under the curve (AUC) and compare results to those obtained for SPECT RNA phase analysis and PET scar size analysis methods. Results: Using the normal average cluster analysis approach, the septal wall produced statistically significant results for predicting CRT results in the ischemic population (ROC AUC = 0.73;p < 0.05 vs. equal chance ROC AUC = 0.50) with an optimal operating point of 71% sensitivity and 60% specificity. Cluster

  1. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  2. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  3. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  4. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  5. [Principal component analysis and cluster analysis of inorganic elements in sea cucumber Apostichopus japonicus].

    Science.gov (United States)

    Liu, Xiao-Fang; Xue, Chang-Hu; Wang, Yu-Ming; Li, Zhao-Jie; Xue, Yong; Xu, Jie

    2011-11-01

    The present study is to investigate the feasibility of multi-elements analysis in determination of the geographical origin of sea cucumber Apostichopus japonicus, and to make choice of the effective tracers in sea cucumber Apostichopus japonicus geographical origin assessment. The content of the elements such as Al, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Mo, Cd, Hg and Pb in sea cucumber Apostichopus japonicus samples from seven places of geographical origin were determined by means of ICP-MS. The results were used for the development of elements database. Cluster analysis(CA) and principal component analysis (PCA) were applied to differentiate the sea cucumber Apostichopus japonicus geographical origin. Three principal components which accounted for over 89% of the total variance were extracted from the standardized data. The results of Q-type cluster analysis showed that the 26 samples could be clustered reasonably into five groups, the classification results were significantly associated with the marine distribution of the sea cucumber Apostichopus japonicus samples. The CA and PCA were the effective methods for elements analysis of sea cucumber Apostichopus japonicus samples. The content of the mineral elements in sea cucumber Apostichopus japonicus samples was good chemical descriptors for differentiating their geographical origins.

  6. Application of cluster analysis to geochemical compositional data for identifying ore-related geochemical anomalies

    Science.gov (United States)

    Zhou, Shuguang; Zhou, Kefa; Wang, Jinlin; Yang, Genfang; Wang, Shanshan

    2017-12-01

    Cluster analysis is a well-known technique that is used to analyze various types of data. In this study, cluster analysis is applied to geochemical data that describe 1444 stream sediment samples collected in northwestern Xinjiang with a sample spacing of approximately 2 km. Three algorithms (the hierarchical, k-means, and fuzzy c-means algorithms) and six data transformation methods (the z-score standardization, ZST; the logarithmic transformation, LT; the additive log-ratio transformation, ALT; the centered log-ratio transformation, CLT; the isometric log-ratio transformation, ILT; and no transformation, NT) are compared in terms of their effects on the cluster analysis of the geochemical compositional data. The study shows that, on the one hand, the ZST does not affect the results of column- or variable-based (R-type) cluster analysis, whereas the other methods, including the LT, the ALT, and the CLT, have substantial effects on the results. On the other hand, the results of the row- or observation-based (Q-type) cluster analysis obtained from the geochemical data after applying NT and the ZST are relatively poor. However, we derive some improved results from the geochemical data after applying the CLT, the ILT, the LT, and the ALT. Moreover, the k-means and fuzzy c-means clustering algorithms are more reliable than the hierarchical algorithm when they are used to cluster the geochemical data. We apply cluster analysis to the geochemical data to explore for Au deposits within the study area, and we obtain a good correlation between the results retrieved by combining the CLT or the ILT with the k-means or fuzzy c-means algorithms and the potential zones of Au mineralization. Therefore, we suggest that the combination of the CLT or the ILT with the k-means or fuzzy c-means algorithms is an effective tool to identify potential zones of mineralization from geochemical data.

  7. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  8. The Auroral Field-aligned Acceleration - Cluster Results

    Science.gov (United States)

    Vaivads, A.; Cluster Auroral Team

    The four Cluster satellites cross the auroral field lines at altitudes well above most of acceleration region. Thus, the orbit is appropriate for studies of the generator side of this region. We consider the energy transport towards the acceleration region and different mechanisms for generating the potential drop. Using data from Cluster we can also for the first time study the dynamics of the generator on a minute scale. We present data from a few auroral field crossings where Cluster are in conjunction with DMSP satellites. We use electric and magnetic field data to estimate electrostatic po- tential along the satellite orbit, Poynting flux as well as the presence of plasma waves. These we can compare with data from particle and wave instruments on Cluster and on low latitude satellites to try to make a consistent picture of the acceleration region formation in these cases. Preliminary results show close agreement both between in- tegrated potential values at Cluster and electron peak energies at DMSP as well as close agreement between the integrated Poynting flux values at Cluster and the elec- tron energy flux at DMSP. At the end we draw a parallels between auroral electron acceleration and electron acceleration at the magnetopause.

  9. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  10. A formal concept analysis approach to consensus clustering of multi-experiment expression data

    Science.gov (United States)

    2014-01-01

    Background Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them. Results We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group. These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological

  11. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  12. Cluster analysis

    OpenAIRE

    Mucha, Hans-Joachim; Sofyan, Hizir

    2000-01-01

    As an explorative technique, duster analysis provides a description or a reduction in the dimension of the data. It classifies a set of observations into two or more mutually exclusive unknown groups based on combinations of many variables. Its aim is to construct groups in such a way that the profiles of objects in the same groups are relatively homogenous whereas the profiles of objects in different groups are relatively heterogeneous. Clustering is distinct from classification techniques, ...

  13. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    Science.gov (United States)

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  14. Clustering of users of digital libraries through log file analysis

    Directory of Open Access Journals (Sweden)

    Juan Antonio Martínez-Comeche

    2017-09-01

    Full Text Available This study analyzes how users perform information retrieval tasks when introducing queries to the Hispanic Digital Library. Clusters of users are differentiated based on their distinct information behavior. The study used the log files collected by the server over a year and different possible clustering algorithms are compared. The k-means algorithm is found to be a suitable clustering method for the analysis of large log files from digital libraries. In the case of the Hispanic Digital Library the results show three clusters of users and the characteristic information behavior of each group is described.

  15. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  16. Clustering results - Gclust Server | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us Gclust Server Clustering results Data detail Data name Clustering results DOI 10.18908/lsdba...se Update History of This Database Site Policy | Contact Us Clustering results - Gclust Server | LSDB Archive ...

  17. Multisource Images Analysis Using Collaborative Clustering

    Directory of Open Access Journals (Sweden)

    Pierre Gançarski

    2008-04-01

    Full Text Available The development of very high-resolution (VHR satellite imagery has produced a huge amount of data. The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images. Consequently, the image analyst has often many different images available, representing the same area of the Earth surface. These images can be from different dates, produced by different sensors, or even at different resolutions. The lack of machine learning tools using all these representations in an overall process constraints to a sequential analysis of these various images. In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene. Each one works on a different remotely sensed image and, thus, produces different and useful information. These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar. Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image. The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement lead to a better understanding of the scene. The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding.

  18. SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance.

    Science.gov (United States)

    Sacha, Dominik; Kraus, Matthias; Bernard, Jurgen; Behrisch, Michael; Schreck, Tobias; Asano, Yuki; Keim, Daniel A

    2018-01-01

    Clustering is a core building block for data analysis, aiming to extract otherwise hidden structures and relations from raw datasets, such as particular groups that can be effectively related, compared, and interpreted. A plethora of visual-interactive cluster analysis techniques has been proposed to date, however, arriving at useful clusterings often requires several rounds of user interactions to fine-tune the data preprocessing and algorithms. We present a multi-stage Visual Analytics (VA) approach for iterative cluster refinement together with an implementation (SOMFlow) that uses Self-Organizing Maps (SOM) to analyze time series data. It supports exploration by offering the analyst a visual platform to analyze intermediate results, adapt the underlying computations, iteratively partition the data, and to reflect previous analytical activities. The history of previous decisions is explicitly visualized within a flow graph, allowing to compare earlier cluster refinements and to explore relations. We further leverage quality and interestingness measures to guide the analyst in the discovery of useful patterns, relations, and data partitions. We conducted two pair analytics experiments together with a subject matter expert in speech intonation research to demonstrate that the approach is effective for interactive data analysis, supporting enhanced understanding of clustering results as well as the interactive process itself.

  19. Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

    Directory of Open Access Journals (Sweden)

    Tao Ran

    2016-01-01

    Full Text Available As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.

  20. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder

    DEFF Research Database (Denmark)

    Bauer, M; Glenn, T; Alda, M

    2015-01-01

    Purpose: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset...... cohort. Model-based clustering (mixture analysis) was then performed on the age of onset data using the residuals. Clinical variables in subgroups were compared. Results: There was a strong birth cohort effect. Without adjusting for the birth cohort, three subgroups were found by clustering. After...... on the age of onset, and that there is a birth cohort effect. Including the birth cohort adjustment altered the number and characteristics of subgroups detected when clustering by age of onset. Further investigation is needed to determine if combining both approaches will identify subgroups that are more...

  1. ANALYSIS OF DEVELOPING BATIK INDUSTRY CLUSTER IN BAKARAN VILLAGE CENTRAL JAVA PROVINCE

    Directory of Open Access Journals (Sweden)

    Hermanto Hermanto

    2017-06-01

    Full Text Available SMEs grow in a cluster in a certain geographical area. The entrepreneurs grow and thrive through the business cluster. Central Java Province has a lot of business clusters in improving the regional economy, one of which is batik industry cluster. Pati Regency is one of regencies / city in Central Java that has the lowest turnover. Batik industy cluster in Pati develops quite well, which can be seen from the increasing number of batik industry incorporated in the cluster. This research examines the strategy of developing the batik industry cluster in Pati Regency. The purpose of this research is to determine the proper strategy for developing the batik industry clusters in Pati. The method of research is quantitative. The analysis tool of this research is the Strengths, Weakness, Opportunity, Threats (SWOT analysis. The result of SWOT analysis in this research shows that the proper strategy for developing the batik industry cluster in Pati is optimizing the management of batik business cluster in Bakaran Village; the local government provides information of the facility of business capital loans; the utilization of labors from Bakaran Village while improving the quality of labors by training, and marketing the Bakaran batik to the broader markets while maintaining the quality of batik. Advice that can be given from this research is that the parties who have a role in batik industry cluster development in Bakaran Village, Pati Regency, such as the Local Government.

  2. Cluster analysis in severe emphysema subjects using phenotype and genotype data: an exploratory investigation

    Directory of Open Access Journals (Sweden)

    Martinez Fernando J

    2010-03-01

    Full Text Available Abstract Background Numerous studies have demonstrated associations between genetic markers and COPD, but results have been inconsistent. One reason may be heterogeneity in disease definition. Unsupervised learning approaches may assist in understanding disease heterogeneity. Methods We selected 31 phenotypic variables and 12 SNPs from five candidate genes in 308 subjects in the National Emphysema Treatment Trial (NETT Genetics Ancillary Study cohort. We used factor analysis to select a subset of phenotypic variables, and then used cluster analysis to identify subtypes of severe emphysema. We examined the phenotypic and genotypic characteristics of each cluster. Results We identified six factors accounting for 75% of the shared variability among our initial phenotypic variables. We selected four phenotypic variables from these factors for cluster analysis: 1 post-bronchodilator FEV1 percent predicted, 2 percent bronchodilator responsiveness, and quantitative CT measurements of 3 apical emphysema and 4 airway wall thickness. K-means cluster analysis revealed four clusters, though separation between clusters was modest: 1 emphysema predominant, 2 bronchodilator responsive, with higher FEV1; 3 discordant, with a lower FEV1 despite less severe emphysema and lower airway wall thickness, and 4 airway predominant. Of the genotypes examined, membership in cluster 1 (emphysema-predominant was associated with TGFB1 SNP rs1800470. Conclusions Cluster analysis may identify meaningful disease subtypes and/or groups of related phenotypic variables even in a highly selected group of severe emphysema subjects, and may be useful for genetic association studies.

  3. Clustering of health-related behaviors among early and mid-adolescents in Tuscany: results from a representative cross-sectional study.

    Science.gov (United States)

    Lazzeri, Giacomo; Panatto, Donatella; Domnich, Alexander; Arata, Lucia; Pammolli, Andrea; Simi, Rita; Giacchi, Mariano Vincenzo; Amicizia, Daniela; Gasparini, Roberto

    2018-03-01

    A huge amount of literature suggests that adolescents' health-related behaviors tend to occur in clusters, and the understanding of such behavioral clustering may have direct implications for the effective tailoring of health-promotion interventions. Despite the usefulness of analyzing clustering, Italian data on this topic are scant. This study aimed to evaluate the clustering patterns of health-related behaviors. The present study is based on data from the Health Behaviors in School-aged Children (HBSC) study conducted in Tuscany in 2010, which involved 3291 11-, 13- and 15-year olds. To aggregate students' data on 22 health-related behaviors, factor analysis and subsequent cluster analysis were performed. Factor analysis revealed eight factors, which were dubbed in accordance with their main traits: 'Alcohol drinking', 'Smoking', 'Physical activity', 'Screen time', 'Signs & symptoms', 'Healthy eating', 'Violence' and 'Sweet tooth'. These factors explained 67% of variance and underwent cluster analysis. A six-cluster κ-means solution was established with a 93.8% level of classification validity. The between-cluster differences in both mean age and gender distribution were highly statistically significant. Health-compromising behaviors are common among Tuscan teens and occur in distinct clusters. These results may be used by schools, health-promotion authorities and other stakeholders to design and implement tailored preventive interventions in Tuscany.

  4. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability

    Science.gov (United States)

    Miller, Christopher B.; Bartlett, Delwyn J.; Mullins, Anna E.; Dodds, Kirsty L.; Gordon, Christopher J.; Kyle, Simon D.; Kim, Jong Won; D'Rozario, Angela L.; Lee, Rico S.C.; Comas, Maria; Marshall, Nathaniel S.; Yee, Brendon J.; Espie, Colin A.; Grunstein, Ronald R.

    2016-01-01

    Study Objectives: To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative (q)-EEG and heart rate variability (HRV). Methods: Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. Results: From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q-EEG. Clinical Trial Registration: Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. Citation: Miller CB, Bartlett DJ, Mullins AE, Dodds KL, Gordon CJ, Kyle SD, Kim JW, D'Rozario AL, Lee RS, Comas M, Marshall NS, Yee BJ, Espie CA, Grunstein RR. Clusters of Insomnia Disorder: an exploratory cluster analysis of objective sleep parameters reveals differences in neurocognitive functioning, quantitative EEG, and heart rate variability. SLEEP 2016;39(11):1993–2004. PMID:27568796

  5. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  6. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  7. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  8. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  9. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    Science.gov (United States)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  10. Phenotypes of asthma in low-income children and adolescents: cluster analysis

    Directory of Open Access Journals (Sweden)

    Anna Lucia Barros Cabral

    Full Text Available ABSTRACT Objective: Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. Methods: We included 306 children and adolescents (6-18 years of age with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Results: Three clusters were defined for our population. Cluster 1 (n = 94 included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87 included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108 included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Conclusions: Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability.

  11. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  12. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  13. [Optimization of cluster analysis based on drug resistance profiles of MRSA isolates].

    Science.gov (United States)

    Tani, Hiroya; Kishi, Takahiko; Gotoh, Minehiro; Yamagishi, Yuka; Mikamo, Hiroshige

    2015-12-01

    We examined 402 methicillin-resistant Staphylococcus aureus (MRSA) strains isolated from clinical specimens in our hospital between November 19, 2010 and December 27, 2011 to evaluate the similarity between cluster analysis of drug susceptibility tests and pulsed-field gel electrophoresis (PFGE). The results showed that the 402 strains tested were classified into 27 PFGE patterns (151 subtypes of patterns). Cluster analyses of drug susceptibility tests with the cut-off distance yielding a similar classification capability showed favorable results--when the MIC method was used, and minimum inhibitory concentration (MIC) values were used directly in the method, the level of agreement with PFGE was 74.2% when 15 drugs were tested. The Unweighted Pair Group Method with Arithmetic mean (UPGMA) method was effective when the cut-off distance was 16. Using the SIR method in which susceptible (S), intermediate (I), and resistant (R) were coded as 0, 2, and 3, respectively, according to the Clinical and Laboratory Standards Institute (CLSI) criteria, the level of agreement with PFGE was 75.9% when the number of drugs tested was 17, the method used for clustering was the UPGMA, and the cut-off distance was 3.6. In addition, to assess the reproducibility of the results, 10 strains were randomly sampled from the overall test and subjected to cluster analysis. This was repeated 100 times under the same conditions. The results indicated good reproducibility of the results, with the level of agreement with PFGE showing a mean of 82.0%, standard deviation of 12.1%, and mode of 90.0% for the MIC method and a mean of 80.0%, standard deviation of 13.4%, and mode of 90.0% for the SIR method. In summary, cluster analysis for drug susceptibility tests is useful for the epidemiological analysis of MRSA.

  14. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  15. Cluster Analysis of Maize Inbred Lines

    Directory of Open Access Journals (Sweden)

    Jiban Shrestha

    2016-12-01

    Full Text Available The determination of diversity among inbred lines is important for heterosis breeding. Sixty maize inbred lines were evaluated for their eight agro morphological traits during winter season of 2011 to analyze their genetic diversity. Clustering was done by average linkage method. The inbred lines were grouped into six clusters. Inbred lines grouped into Clusters II had taller plants with maximum number of leaves. The cluster III was characterized with shorter plants with minimum number of leaves. The inbred lines categorized into cluster V had early flowering whereas the group into cluster VI had late flowering time. The inbred lines grouped into the cluster III were characterized by higher value of anthesis silking interval (ASI and those of cluster VI had lower value of ASI. These results showed that the inbred lines having widely divergent clusters can be utilized in hybrid breeding programme.

  16. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  17. Cluster: A New Application for Spatial Analysis of Pixelated Data for Epiphytotics.

    Science.gov (United States)

    Nelson, Scot C; Corcoja, Iulian; Pethybridge, Sarah J

    2017-12-01

    Spatial analysis of epiphytotics is essential to develop and test hypotheses about pathogen ecology, disease dynamics, and to optimize plant disease management strategies. Data collection for spatial analysis requires substantial investment in time to depict patterns in various frames and hierarchies. We developed a new approach for spatial analysis of pixelated data in digital imagery and incorporated the method in a stand-alone desktop application called Cluster. The user isolates target entities (clusters) by designating up to 24 pixel colors as nontargets and moves a threshold slider to visualize the targets. The app calculates the percent area occupied by targeted pixels, identifies the centroids of targeted clusters, and computes the relative compass angle of orientation for each cluster. Users can deselect anomalous clusters manually and/or automatically by specifying a size threshold value to exclude smaller targets from the analysis. Up to 1,000 stochastic simulations randomly place the centroids of each cluster in ranked order of size (largest to smallest) within each matrix while preserving their calculated angles of orientation for the long axes. A two-tailed probability t test compares the mean inter-cluster distances for the observed versus the values derived from randomly simulated maps. This is the basis for statistical testing of the null hypothesis that the clusters are randomly distributed within the frame of interest. These frames can assume any shape, from natural (e.g., leaf) to arbitrary (e.g., a rectangular or polygonal field). Cluster summarizes normalized attributes of clusters, including pixel number, axis length, axis width, compass orientation, and the length/width ratio, available to the user as a downloadable spreadsheet. Each simulated map may be saved as an image and inspected. Provided examples demonstrate the utility of Cluster to analyze patterns at various spatial scales in plant pathology and ecology and highlight the

  18. Cluster Analysis of International Information and Social Development.

    Science.gov (United States)

    Lau, Jesus

    1990-01-01

    Analyzes information activities in relation to socioeconomic characteristics in low, middle, and highly developed economies for the years 1960 and 1977 through the use of cluster analysis. Results of data from 31 countries suggest that information development is achieved mainly by countries that have also achieved social development. (26…

  19. [The attitude of German veterinarians towards farm animal welfare: results of a cluster analysis].

    Science.gov (United States)

    Heise, Heinke; Kemper, Nicole; Theuvsen, Ludwig

    2016-01-01

    In recent years the issue of animal welfare in intensive livestock production systems has been subjected to increasing criticism from the broad public. Some groups in society ask for higher animal welfare standards and there is an increas- ing number of consumers who prefer meat from more animal friendly husbandry systems. An intense social debate on animal welfare has flared up in the recent past. Veterinarians are considered as experts for the assessment of animal welfare. Nevertheless they are rarely consulted in the current debate. Therefore, only little is known about their attitude towards animal welfare in livestock farming. Even for Germany, there is so far no comprehensive analysis about their atti- tudes towards animal welfare and animal welfare programs. In the present study, 433 veterinarians were questioned via an online survey. The results show that veterinarians have a very differentiated perception of the issue animal welfare. Four groups (clusters) which have different attitudes towards livestock farming, voluntary animal welfare programs, farm size and the effects of national animal welfare standards were identified.

  20. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  1. Evolution of cluster X-ray luminosities and radii: Results from the 160 square degree rosat survey

    DEFF Research Database (Denmark)

    Vikhlinin, A.; McNamara, B.R.; Forman, W.

    1998-01-01

    We searched for cluster X-ray luminosity and radius evolution using our sample of 203 galaxy clusters detected in the 160 deg(2) survey with the ROSAT PSPC (Vikhlinin et al.). With such a large area survey, it is possible, for the first time with ROSAT, to test the evolution of luminous clusters, L......-X > 3 x 10(44) ergs s(-1) in the 0.5-2 keV band. We detect a factor of 3-4 deficit of such luminous clusters at z > 0.3 compared with the present. The evolution is much weaker or absent at modestly lower luminosities, (1-3) x 10(44) ergs s(-1). At still lower luminosities, we find no evolution from...... the analysis of the log N-log S relation. The results in the two upper L, bins are in agreement with the Einstein Extended Medium-Sensitivity Survey evolution result (Gioia et al.; Henry ct al.), which was obtained using a completely independent cluster sample. The low-L-X results are in agreement with other...

  2. Transcriptional analysis of exopolysaccharides biosynthesis gene clusters in Lactobacillus plantarum.

    Science.gov (United States)

    Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia

    2016-04-01

    Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.

  3. Study on Adaptive Parameter Determination of Cluster Analysis in Urban Management Cases

    Science.gov (United States)

    Fu, J. Y.; Jing, C. F.; Du, M. Y.; Fu, Y. L.; Dai, P. P.

    2017-09-01

    The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object's highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data's spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  4. STUDY ON ADAPTIVE PARAMETER DETERMINATION OF CLUSTER ANALYSIS IN URBAN MANAGEMENT CASES

    Directory of Open Access Journals (Sweden)

    J. Y. Fu

    2017-09-01

    Full Text Available The fine management for cities is the important way to realize the smart city. The data mining which uses spatial clustering analysis for urban management cases can be used in the evaluation of urban public facilities deployment, and support the policy decisions, and also provides technical support for the fine management of the city. Aiming at the problem that DBSCAN algorithm which is based on the density-clustering can not realize parameter adaptive determination, this paper proposed the optimizing method of parameter adaptive determination based on the spatial analysis. Firstly, making analysis of the function Ripley's K for the data set to realize adaptive determination of global parameter MinPts, which means setting the maximum aggregation scale as the range of data clustering. Calculating every point object’s highest frequency K value in the range of Eps which uses K-D tree and setting it as the value of clustering density to realize the adaptive determination of global parameter MinPts. Then, the R language was used to optimize the above process to accomplish the precise clustering of typical urban management cases. The experimental results based on the typical case of urban management in XiCheng district of Beijing shows that: The new DBSCAN clustering algorithm this paper presents takes full account of the data’s spatial and statistical characteristic which has obvious clustering feature, and has a better applicability and high quality. The results of the study are not only helpful for the formulation of urban management policies and the allocation of urban management supervisors in XiCheng District of Beijing, but also to other cities and related fields.

  5. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  6. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  7. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  8. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  9. Fatigue Feature Extraction Analysis based on a K-Means Clustering Approach

    Directory of Open Access Journals (Sweden)

    M.F.M. Yunoh

    2015-06-01

    Full Text Available This paper focuses on clustering analysis using a K-means approach for fatigue feature dataset extraction. The aim of this study is to group the dataset as closely as possible (homogeneity for the scattered dataset. Kurtosis, the wavelet-based energy coefficient and fatigue damage are calculated for all segments after the extraction process using wavelet transform. Kurtosis, the wavelet-based energy coefficient and fatigue damage are used as input data for the K-means clustering approach. K-means clustering calculates the average distance of each group from the centroid and gives the objective function values. Based on the results, maximum values of the objective function can be seen in the two centroid clusters, with a value of 11.58. The minimum objective function value is found at 8.06 for five centroid clusters. It can be seen that the objective function with the lowest value for the number of clusters is equal to five; which is therefore the best cluster for the dataset.

  10. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  11. Gene expression data clustering and it’s application in differential analysis of leukemia

    Directory of Open Access Journals (Sweden)

    M. Vahedi

    2008-02-01

    Full Text Available Introduction: DNA microarray technique is one of the most important categories in bioinformatics,which allows the possibility of monitoring thousands of expressed genes has been resulted in creatinggiant data bases of gene expression data, recently. Statistical analysis of such databases includednormalization, clustering, classification and etc.Materials and Methods: Golub et al (1999 collected data bases of leukemia based on the method ofoligonucleotide. The data is on the internet. In this paper, we analyzed gene expression data. It wasclustered by several methods including multi-dimensional scaling, hierarchical and non-hierarchicalclustering. Data set included 20 Acute Lymphoblastic Leukemia (ALL patients and 14 Acute MyeloidLeukemia (AML patients. The results of tow methods of clustering were compared with regard to realgrouping (ALL & AML. R software was used for data analysis.Results: Specificity and sensitivity of divisive hierarchical clustering in diagnosing of ALL patientswere 75% and 92%, respectively. Specificity and sensitivity of partitioning around medoids indiagnosing of ALL patients were 90% and 93%, respectively. These results showed a wellaccomplishment of both methods of clustering. It is considerable that, due to clustering methodsresults, one of the samples was placed in ALL groups, which was in AML group in clinical test.Conclusion: With regard to concordance of the results with real grouping of data, therefore we canuse these methods in the cases where we don't have accurate information of real grouping of data.Moreover, Results of clustering might distinct subgroups of data in such a way that would be necessaryfor concordance with clinical outcomes, laboratory results and so on.

  12. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  13. The Assessment of Hydrogen Energy Systems for Fuel Cell Vehicles Using Principal Componenet Analysis and Cluster Analysis

    DEFF Research Database (Denmark)

    Ren, Jingzheng; Tan, Shiyu; Dong, Lichun

    2012-01-01

    and analysis of the hydrogen systems is meaningful for decision makers to select the best scenario. principal component analysis (PCA) has been used to evaluate the integrated performance of different hydrogen energy systems and select the best scenario, and hierarchical cluster analysis (CA) has been used...... for transportation of hydrogen, hydrogen gas tank for the storage of hydrogen at refueling stations, and gaseous hydrogen as power energy for fuel cell vehicles has been recognized as the best scenario. Also, the clustering results calculated by CA are consistent with those determined by PCA, denoting...

  14. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  15. Diagnostics of subtropical plants functional state by cluster analysis

    Directory of Open Access Journals (Sweden)

    Oksana Belous

    2016-05-01

    Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty

  16. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    Science.gov (United States)

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  17. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  18. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  19. Common Factor Analysis Versus Principal Component Analysis: Choice for Symptom Cluster Research

    Directory of Open Access Journals (Sweden)

    Hee-Ju Kim, PhD, RN

    2008-03-01

    Conclusion: If the study purpose is to explain correlations among variables and to examine the structure of the data (this is usual for most cases in symptom cluster research, CFA provides a more accurate result. If the purpose of a study is to summarize data with a smaller number of variables, PCA is the choice. PCA can also be used as an initial step in CFA because it provides information regarding the maximum number and nature of factors. In using factor analysis for symptom cluster research, several issues need to be considered, including subjectivity of solution, sample size, symptom selection, and level of measure.

  20. Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio

    Directory of Open Access Journals (Sweden)

    Elena ANDREI (DRAGOMIR

    2012-08-01

    Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.

  1. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis

    Directory of Open Access Journals (Sweden)

    Huanhuan Li

    2017-08-01

    Full Text Available The Shipboard Automatic Identification System (AIS is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW, a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our

  2. A Dimensionality Reduction-Based Multi-Step Clustering Method for Robust Vessel Trajectory Analysis.

    Science.gov (United States)

    Li, Huanhuan; Liu, Jingxian; Liu, Ryan Wen; Xiong, Naixue; Wu, Kefeng; Kim, Tai-Hoon

    2017-08-04

    The Shipboard Automatic Identification System (AIS) is crucial for navigation safety and maritime surveillance, data mining and pattern analysis of AIS information have attracted considerable attention in terms of both basic research and practical applications. Clustering of spatio-temporal AIS trajectories can be used to identify abnormal patterns and mine customary route data for transportation safety. Thus, the capacities of navigation safety and maritime traffic monitoring could be enhanced correspondingly. However, trajectory clustering is often sensitive to undesirable outliers and is essentially more complex compared with traditional point clustering. To overcome this limitation, a multi-step trajectory clustering method is proposed in this paper for robust AIS trajectory clustering. In particular, the Dynamic Time Warping (DTW), a similarity measurement method, is introduced in the first step to measure the distances between different trajectories. The calculated distances, inversely proportional to the similarities, constitute a distance matrix in the second step. Furthermore, as a widely-used dimensional reduction method, Principal Component Analysis (PCA) is exploited to decompose the obtained distance matrix. In particular, the top k principal components with above 95% accumulative contribution rate are extracted by PCA, and the number of the centers k is chosen. The k centers are found by the improved center automatically selection algorithm. In the last step, the improved center clustering algorithm with k clusters is implemented on the distance matrix to achieve the final AIS trajectory clustering results. In order to improve the accuracy of the proposed multi-step clustering algorithm, an automatic algorithm for choosing the k clusters is developed according to the similarity distance. Numerous experiments on realistic AIS trajectory datasets in the bridge area waterway and Mississippi River have been implemented to compare our proposed method with

  3. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  4. Profitability and efficiency of Italian utilities: cluster analysis of financial statement ratios

    International Nuclear Information System (INIS)

    Linares, E.

    2008-01-01

    , II) return on invested capital and return on equity capital, and III) efficiency of personnel. Cluster analysis has identified four clusters. Each cluster has then been described in terms of the factors, the profitability and the efficiency indicators (Return on Sales, Return on Equity, Return on Assets, Turnover / Personnel Costs, Capital Intensity), the presence in different stages of the energy value chains and the same descriptive variables used for the previous set of clusters. Based on indicators of cost incidence, profitability and efficiency, five cluster have emerged. Based on three non-observable factors (efficiency in ordinary and financial management, return on invested capital and on equity capital, efficiency of personnel) four clusters have emerged. These results suggest that there may be a nexus between the performances of the observed Italian utilities and some of their characteristics, such as size (in terms of total assets), presence in more or less capital-intensive stages of the value chains of different public utility services, and the industries in which these enterprises generate most of their sales revenues (i.e. their core businesses) or, in other words, the extent of diversification. This hints to a couple of avenues for future research, such as the replication of this kind of analysis on a larger sample, with the same or different indicators, and the identification of economies of scale, economies of density, economies of scope and industry-specific factors that may help to explain the results of multivariate statistical analysis of financial statement data. [it

  5. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Directory of Open Access Journals (Sweden)

    Valentina Meuti

    2014-01-01

    Full Text Available Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2. A clinical group of subjects with perinatal depression (PND, 55 subjects was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3 and an “apparently common” one (cluster 2. The first cluster (39.5% collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95% includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5% shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions.

  6. MMPI-2: Cluster Analysis of Personality Profiles in Perinatal Depression—Preliminary Evidence

    Science.gov (United States)

    Grillo, Alessandra; Lauriola, Marco; Giacchetti, Nicoletta

    2014-01-01

    Background. To assess personality characteristics of women who develop perinatal depression. Methods. The study started with a screening of a sample of 453 women in their third trimester of pregnancy, to which was administered a survey data form, the Edinburgh Postnatal Depression Scale (EPDS) and the Minnesota Multiphasic Personality Inventory 2 (MMPI-2). A clinical group of subjects with perinatal depression (PND, 55 subjects) was selected; clinical and validity scales of MMPI-2 were used as predictors in hierarchical cluster analysis carried out. Results. The analysis identified three clusters of personality profile: two “clinical” clusters (1 and 3) and an “apparently common” one (cluster 2). The first cluster (39.5%) collects structures of personality with prevalent obsessive or dependent functioning tending to develop a “psychasthenic” depression; the third cluster (13.95%) includes women with prevalent borderline functioning tending to develop “dysphoric” depression; the second cluster (46.5%) shows a normal profile with a “defensive” attitude, probably due to the presence of defense mechanisms or to the fear of stigma. Conclusion. Characteristics of personality have a key role in clinical manifestations of perinatal depression; it is important to detect them to identify mothers at risk and to plan targeted therapeutic interventions. PMID:25574499

  7. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  8. Evaluation of Portland cement from X-ray diffraction associated with cluster analysis

    International Nuclear Information System (INIS)

    Gobbo, Luciano de Andrade; Montanheiro, Tarcisio Jose; Montanheiro, Filipe; Sant'Agostino, Lilia Mascarenhas

    2013-01-01

    The Brazilian cement industry produced 64 million tons of cement in 2012, with noteworthy contribution of CP-II (slag), CP-III (blast furnace) and CP-IV (pozzolanic) cements. The industrial pole comprises about 80 factories that utilize raw materials of different origins and chemical compositions that require enhanced analytical technologies to optimize production in order to gain space in the growing consumer market in Brazil. This paper assesses the sensitivity of mineralogical analysis by X-ray diffraction associated with cluster analysis to distinguish different kinds of cements with different additions. This technique can be applied, for example, in the prospection of different types of limestone (calcitic, dolomitic and siliceous) as well as in the qualification of different clinkers. The cluster analysis does not require any specific knowledge of the mineralogical composition of the diffractograms to be clustered; rather, it is based on their similarity. The materials tested for addition have different origins: fly ashes from different power stations from South Brazil and slag from different steel plants in the Southeast. Cement with different additions of limestone and white Portland cement were also used. The Rietveld method of qualitative and quantitative analysis was used for measuring the results generated by the cluster analysis technique. (author)

  9. Automated analysis of organic particles using cluster SIMS

    Energy Technology Data Exchange (ETDEWEB)

    Gillen, Greg; Zeissler, Cindy; Mahoney, Christine; Lindstrom, Abigail; Fletcher, Robert; Chi, Peter; Verkouteren, Jennifer; Bright, David; Lareau, Richard T.; Boldman, Mike

    2004-06-15

    Cluster primary ion bombardment combined with secondary ion imaging is used on an ion microscope secondary ion mass spectrometer for the spatially resolved analysis of organic particles on various surfaces. Compared to the use of monoatomic primary ion beam bombardment, the use of a cluster primary ion beam (SF{sub 5}{sup +} or C{sub 8}{sup -}) provides significant improvement in molecular ion yields and a reduction in beam-induced degradation of the analyte molecules. These characteristics of cluster bombardment, along with automated sample stage control and custom image analysis software are utilized to rapidly characterize the spatial distribution of trace explosive particles, narcotics and inkjet-printed microarrays on a variety of surfaces.

  10. Mental State Talk Structure in Children’s Narratives: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Giuliana Pinto

    2017-01-01

    Full Text Available This study analysed children’s Theory of Mind (ToM as assessed by mental state talk in oral narratives. We hypothesized that the children’s mental state talk in narratives has an underlying structure, with specific terms organized in clusters. Ninety-eight children attending the last year of kindergarten were asked to tell a story twice, at the beginning and at the end of the school year. Mental state talk was analysed by identifying terms and expressions referring to perceptual, physiological, emotional, willingness, cognitive, moral, and sociorelational states. The cluster analysis showed that children’s mental state talk is organized in two main clusters: perceptual states and affective states. Results from the study confirm the feasibility of narratives as an outlet to inquire mental state talk and offer a more fine-grained analysis of mental state talk structure.

  11. Comparison of multianalyte proficiency test results by sum of ranking differences, principal component analysis, and hierarchical cluster analysis.

    Science.gov (United States)

    Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša

    2013-10-01

    Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can

  12. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  13. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  14. Using Cluster Analysis and ICP-MS to Identify Groups of Ecstasy Tablets in Sao Paulo State, Brazil.

    Science.gov (United States)

    Maione, Camila; de Oliveira Souza, Vanessa Cristina; Togni, Loraine Rezende; da Costa, José Luiz; Campiglia, Andres Dobal; Barbosa, Fernando; Barbosa, Rommel Melgaço

    2017-11-01

    The variations found in the elemental composition in ecstasy samples result in spectral profiles with useful information for data analysis, and cluster analysis of these profiles can help uncover different categories of the drug. We provide a cluster analysis of ecstasy tablets based on their elemental composition. Twenty-five elements were determined by ICP-MS in tablets apprehended by Sao Paulo's State Police, Brazil. We employ the K-means clustering algorithm along with C4.5 decision tree to help us interpret the clustering results. We found a better number of two clusters within the data, which can refer to the approximated number of sources of the drug which supply the cities of seizures. The C4.5 model was capable of differentiating the ecstasy samples from the two clusters with high prediction accuracy using the leave-one-out cross-validation. The model used only Nd, Ni, and Pb concentration values in the classification of the samples. © 2017 American Academy of Forensic Sciences.

  15. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  16. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  17. Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Directory of Open Access Journals (Sweden)

    Li Weizhong

    2009-10-01

    Full Text Available Abstract Background The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. Results The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". Conclusion RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from http://tools.camera.calit2.net/camera/rammcap/.

  18. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  19. Cluster analysis of Southeastern U.S. climate stations

    Science.gov (United States)

    Stooksbury, D. E.; Michaels, P. J.

    1991-09-01

    A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.

  20. Cluster analysis by optimal decomposition of induced fuzzy sets

    Energy Technology Data Exchange (ETDEWEB)

    Backer, E

    1978-01-01

    Nonsupervised pattern recognition is addressed and the concept of fuzzy sets is explored in order to provide the investigator (data analyst) additional information supplied by the pattern class membership values apart from the classical pattern class assignments. The basic ideas behind the pattern recognition problem, the clustering problem, and the concept of fuzzy sets in cluster analysis are discussed, and a brief review of the literature of the fuzzy cluster analysis is given. Some mathematical aspects of fuzzy set theory are briefly discussed; in particular, a measure of fuzziness is suggested. The optimization-clustering problem is characterized. Then the fundamental idea behind affinity decomposition is considered. Next, further analysis takes place with respect to the partitioning-characterization functions. The iterative optimization procedure is then addressed. The reclassification function is investigated and convergence properties are examined. Finally, several experiments in support of the method suggested are described. Four object data sets serve as appropriate test cases. 120 references, 70 figures, 11 tables. (RWR)

  1. HICOSMO - X-ray analysis of a complete sample of galaxy clusters

    Science.gov (United States)

    Schellenberger, G.; Reiprich, T.

    2017-10-01

    Galaxy clusters are known to be the largest virialized objects in the Universe. Based on the theory of structure formation one can use them as cosmological probes, since they originate from collapsed overdensities in the early Universe and witness its history. The X-ray regime provides the unique possibility to measure in detail the most massive visible component, the intra cluster medium. Using Chandra observations of a local sample of 64 bright clusters (HIFLUGCS) we provide total (hydrostatic) and gas mass estimates of each cluster individually. Making use of the completeness of the sample we quantify two interesting cosmological parameters by a Bayesian cosmological likelihood analysis. We find Ω_{M}=0.3±0.01 and σ_{8}=0.79±0.03 (statistical uncertainties) using our default analysis strategy combining both, a mass function analysis and the gas mass fraction results. The main sources of biases that we discuss and correct here are (1) the influence of galaxy groups (higher incompleteness in parent samples and a differing behavior of the L_{x} - M relation), (2) the hydrostatic mass bias (as determined by recent hydrodynamical simulations), (3) the extrapolation of the total mass (comparing various methods), (4) the theoretical halo mass function and (5) other cosmological (non-negligible neutrino mass), and instrumental (calibration) effects.

  2. Graph analysis of cell clusters forming vascular networks

    Science.gov (United States)

    Alves, A. P.; Mesquita, O. N.; Gómez-Gardeñes, J.; Agero, U.

    2018-03-01

    This manuscript describes the experimental observation of vasculogenesis in chick embryos by means of network analysis. The formation of the vascular network was observed in the area opaca of embryos from 40 to 55 h of development. In the area opaca endothelial cell clusters self-organize as a primitive and approximately regular network of capillaries. The process was observed by bright-field microscopy in control embryos and in embryos treated with Bevacizumab (Avastin), an antibody that inhibits the signalling of the vascular endothelial growth factor (VEGF). The sequence of images of the vascular growth were thresholded, and used to quantify the forming network in control and Avastin-treated embryos. This characterization is made by measuring vessels density, number of cell clusters and the largest cluster density. From the original images, the topology of the vascular network was extracted and characterized by means of the usual network metrics such as: the degree distribution, average clustering coefficient, average short path length and assortativity, among others. This analysis allows to monitor how the largest connected cluster of the vascular network evolves in time and provides with quantitative evidence of the disruptive effects that Avastin has on the tree structure of vascular networks.

  3. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    Science.gov (United States)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  4. Standardized Effect Size Measures for Mediation Analysis in Cluster-Randomized Trials

    Science.gov (United States)

    Stapleton, Laura M.; Pituch, Keenan A.; Dion, Eric

    2015-01-01

    This article presents 3 standardized effect size measures to use when sharing results of an analysis of mediation of treatment effects for cluster-randomized trials. The authors discuss 3 examples of mediation analysis (upper-level mediation, cross-level mediation, and cross-level mediation with a contextual effect) with demonstration of the…

  5. Profiling physical activity motivation based on self-determination theory: a cluster analysis approach.

    Science.gov (United States)

    Friederichs, Stijn Ah; Bolman, Catherine; Oenema, Anke; Lechner, Lilian

    2015-01-01

    In order to promote physical activity uptake and maintenance in individuals who do not comply with physical activity guidelines, it is important to increase our understanding of physical activity motivation among this group. The present study aimed to examine motivational profiles in a large sample of adults who do not comply with physical activity guidelines. The sample for this study consisted of 2473 individuals (31.4% male; age 44.6 ± 12.9). In order to generate motivational profiles based on motivational regulation, a cluster analysis was conducted. One-way analyses of variance were then used to compare the clusters in terms of demographics, physical activity level, motivation to be active and subjective experience while being active. Three motivational clusters were derived based on motivational regulation scores: a low motivation cluster, a controlled motivation cluster and an autonomous motivation cluster. These clusters differed significantly from each other with respect to physical activity behavior, motivation to be active and subjective experience while being active. Overall, the autonomous motivation cluster displayed more favorable characteristics compared to the other two clusters. The results of this study provide additional support for the importance of autonomous motivation in the context of physical activity behavior. The three derived clusters may be relevant in the context of physical activity interventions as individuals within the different clusters might benefit most from different intervention approaches. In addition, this study shows that cluster analysis is a useful method for differentiating between motivational profiles in large groups of individuals who do not comply with physical activity guidelines.

  6. CLUSTER ANALYSIS UKRAINIAN REGIONAL DISTRIBUTION BY LEVEL OF INNOVATION

    Directory of Open Access Journals (Sweden)

    Roman Shchur

    2016-07-01

    Full Text Available   SWOT-analysis of the threats and benefits of innovation development strategy of Ivano-Frankivsk region in the context of financial support was сonducted. Methodical approach to determine of public-private partnerships potential that is tool of innovative economic development financing was identified. Cluster analysis of possibilities of forming public-private partnership in a particular region was carried out. Optimal set of problem areas that require urgent solutions and financial security is defined on the basis of cluster approach. It will help to form practical recommendations for the formation of an effective financial mechanism in the regions of Ukraine. Key words: the mechanism of innovation development financial provision, innovation development, public-private partnerships, cluster analysis, innovative development strategy.

  7. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

    Science.gov (United States)

    Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

    2017-12-01

    The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.

  8. Investigating the usefulness of a cluster-based trend analysis to detect visual field progression in patients with open-angle glaucoma.

    Science.gov (United States)

    Aoki, Shuichiro; Murata, Hiroshi; Fujino, Yuri; Matsuura, Masato; Miki, Atsuya; Tanito, Masaki; Mizoue, Shiro; Mori, Kazuhiko; Suzuki, Katsuyoshi; Yamashita, Takehiro; Kashiwagi, Kenji; Hirasawa, Kazunori; Shoji, Nobuyuki; Asaoka, Ryo

    2017-12-01

    To investigate the usefulness of the Octopus (Haag-Streit) EyeSuite's cluster trend analysis in glaucoma. Ten visual fields (VFs) with the Humphrey Field Analyzer (Carl Zeiss Meditec), spanning 7.7 years on average were obtained from 728 eyes of 475 primary open angle glaucoma patients. Mean total deviation (mTD) trend analysis and EyeSuite's cluster trend analysis were performed on various series of VFs (from 1st to 10th: VF1-10 to 6th to 10th: VF6-10). The results of the cluster-based trend analysis, based on different lengths of VF series, were compared against mTD trend analysis. Cluster-based trend analysis and mTD trend analysis results were significantly associated in all clusters and with all lengths of VF series. Between 21.2% and 45.9% (depending on VF series length and location) of clusters were deemed to progress when the mTD trend analysis suggested no progression. On the other hand, 4.8% of eyes were observed to progress using the mTD trend analysis when cluster trend analysis suggested no progression in any two (or more) clusters. Whole field trend analysis can miss local VF progression. Cluster trend analysis appears as robust as mTD trend analysis and useful to assess both sectorial and whole field progression. Cluster-based trend analyses, in particular the definition of two or more progressing cluster, may help clinicians to detect glaucomatous progression in a timelier manner than using a whole field trend analysis, without significantly compromising specificity. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  9. Clustering applications in financial and economic analysis of the crop production in the Russian regions

    Directory of Open Access Journals (Sweden)

    Gromov Vladislav Vladimirovich

    2013-08-01

    Full Text Available We used the complex mathematical modeling, multivariate statistical-analysis, fuzzy sets to analyze the financial and economic state of the crop production in Russian regions. We developed a system of indicators, detecting the state agricultural sector in the region, based on the results of correlation, factor, cluster analysis and statistics of the Federal State Statistics Service. We performed clustering analyses to divide regions of Russia on selected factors into five groups. A qualitative and quantitative characteristics of each cluster was received.

  10. The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

    Science.gov (United States)

    Leung, S C; Fung, W K; Wong, K H

    1999-01-01

    The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.

  11. Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis

    Directory of Open Access Journals (Sweden)

    Riccardi Giovanna

    2009-03-01

    Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster

  12. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes

    2017-12-01

    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  13. clusterMaker: a multi-algorithm clustering plugin for Cytoscape

    Directory of Open Access Journals (Sweden)

    Morris John H

    2011-11-01

    Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster

  14. Statistical analysis of the spatial distribution of galaxies and clusters

    International Nuclear Information System (INIS)

    Cappi, Alberto

    1993-01-01

    This thesis deals with the analysis of the distribution of galaxies and clusters, describing some observational problems and statistical results. First chapter gives a theoretical introduction, aiming to describe the framework of the formation of structures, tracing the history of the Universe from the Planck time, t_p = 10"-"4"3 sec and temperature corresponding to 10"1"9 GeV, to the present epoch. The most usual statistical tools and models of the galaxy distribution, with their advantages and limitations, are described in chapter two. A study of the main observed properties of galaxy clustering, together with a detailed statistical analysis of the effects of selecting galaxies according to apparent magnitude or diameter, is reported in chapter three. Chapter four delineates some properties of groups of galaxies, explaining the reasons of discrepant results on group distributions. Chapter five is a study of the distribution of galaxy clusters, with different statistical tools, like correlations, percolation, void probability function and counts in cells; it is found the same scaling-invariant behaviour of galaxies. Chapter six describes our finding that rich galaxy clusters too belong to the fundamental plane of elliptical galaxies, and gives a discussion of its possible implications. Finally chapter seven reviews the possibilities offered by multi-slit and multi-fibre spectrographs, and I present some observational work on nearby and distant galaxy clusters. In particular, I show the opportunities offered by ongoing surveys of galaxies coupled with multi-object fibre spectrographs, focusing on the ESO Key Programme A galaxy redshift survey in the south galactic pole region to which I collaborate and on MEFOS, a multi-fibre instrument with automatic positioning. Published papers related to the work described in this thesis are reported in the last appendix. (author) [fr

  15. Au55, a stable glassy cluster: results of ab initio calculations

    Directory of Open Access Journals (Sweden)

    Dieter Vollath

    2017-10-01

    Full Text Available Structure and properties of small nanoparticles are still under discussion. Moreover, some thermodynamic properties and the structural behavior still remain partially unknown. One of the best investigated nanoparticles is the Au55 cluster, which has been analyzed experimentally and theoretically. However, up to now, the results of these studies are still inconsistent. Consequently, we have carried out the present ab initio study of the Au55 cluster, using up-to-date computational concepts, in order to clarify these issues. Our calculations have confirmed the experimental result that the thermodynamically most stable structure is not crystalline, but it is glassy. The non-crystalline structure of this cluster was validated by comparison of the coordination numbers with those of a crystalline cluster. It was found that, in contrast to bulk materials, glass formation is connected to an energy release that is close to the melting enthalpy of bulk gold. Additionally, the surface energy of this cluster was calculated using two different theoretical approaches resulting in values close to the surface energy for bulk gold. It shall be emphasized that it is now possible to give a confidence interval for the value of the surface energy.

  16. Feature-Space Clustering for fMRI Meta-Analysis

    DEFF Research Database (Denmark)

    Goutte, Cyril; Hansen, Lars Kai; Liptrot, Mathew G.

    2001-01-01

    MRI sequences containing several hundreds of images, it is sometimes necessary to invoke feature extraction to reduce the dimensionality of the data space. A second interesting application is in the meta-analysis of fMRI experiment, where features are obtained from a possibly large number of single......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular......, shows interesting differences between individual voxel analysis performed with traditional methods. © 2001 Wiley-Liss, Inc....

  17. Detection of secondary structure elements in proteins by hydrophobic cluster analysis.

    Science.gov (United States)

    Woodcock, S; Mornon, J P; Henrissat, B

    1992-10-01

    Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.

  18. Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

    Science.gov (United States)

    Black, Joshua A.; Knowles, Peter J.

    2018-06-01

    The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.

  19. Analysis of RXTE data on Clusters of Galaxies

    Science.gov (United States)

    Petrosian, Vahe

    2004-01-01

    on GLAST. The details of the nonthermal particle population has important implications for the theories of cluster formation, mergers and evolution. The results of this work were first presented at the High Energy Division meeting of the American astronomical Society at Mt. Tremblene, Canada (Petrosian et al. 2003). and in an invited review talk at the General Assembly of the International Astronomical Union at Sydney, Australia (Petrosian, 2003). A paper describe the observations, the data analysis and its implication is being prepared for publication in the Astrophysical Journal.

  20. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  1. Grey Wolf Optimizer Based on Powell Local Optimization Method for Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Sen Zhang

    2015-01-01

    Full Text Available One heuristic evolutionary algorithm recently proposed is the grey wolf optimizer (GWO, inspired by the leadership hierarchy and hunting mechanism of grey wolves in nature. This paper presents an extended GWO algorithm based on Powell local optimization method, and we call it PGWO. PGWO algorithm significantly improves the original GWO in solving complex optimization problems. Clustering is a popular data analysis and data mining technique. Hence, the PGWO could be applied in solving clustering problems. In this study, first the PGWO algorithm is tested on seven benchmark functions. Second, the PGWO algorithm is used for data clustering on nine data sets. Compared to other state-of-the-art evolutionary algorithms, the results of benchmark and data clustering demonstrate the superior performance of PGWO algorithm.

  2. Extending the input–output energy balance methodology in agriculture through cluster analysis

    International Nuclear Information System (INIS)

    Bojacá, Carlos Ricardo; Casilimas, Héctor Albeiro; Gil, Rodrigo; Schrevens, Eddie

    2012-01-01

    The input–output balance methodology has been applied to characterize the energy balance of agricultural systems. This study proposes to extend this methodology with the inclusion of multivariate analysis to reveal particular patterns in the energy use of a system. The objective was to demonstrate the usefulness of multivariate exploratory techniques to analyze the variability found in a farming system and, establish efficiency categories that can be used to improve the energy balance of the system. To this purpose an input–output analysis was applied to the major greenhouse tomato production area in Colombia. Individual energy profiles were built and the k-means clustering method was applied to the production factors. On average, the production system in the study zone consumes 141.8 GJ ha −1 to produce 96.4 GJ ha −1 , resulting in an energy efficiency of 0.68. With the k-means clustering analysis, three clusters of farmers were identified with energy efficiencies of 0.54, 0.67 and 0.78. The most energy efficient cluster grouped 56.3% of the farmers. It is possible to optimize the production system by improving the management practices of those with the lowest energy use efficiencies. Multivariate analysis techniques demonstrated to be a complementary pathway to improve the energy efficiency of a system. -- Highlights: ► An input–output energy balance was estimated for greenhouse tomatoes in Colombia. ► We used the k-means clustering method to classify growers based on their energy use. ► Three clusters of growers were found with energy efficiencies of 0.54, 0.67 and 0.78. ► Overall system optimization is possible by improving the energy use of the less efficient.

  3. Using cluster analysis to organize and explore regional GPS velocities

    Science.gov (United States)

    Simpson, Robert W.; Thatcher, Wayne; Savage, James C.

    2012-01-01

    Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.

  4. Person mobility in the design and analysis of cluster-randomized cohort prevention trials.

    Science.gov (United States)

    Vuchinich, Sam; Flay, Brian R; Aber, Lawrence; Bickman, Leonard

    2012-06-01

    Person mobility is an inescapable fact of life for most cluster-randomized (e.g., schools, hospitals, clinic, cities, state) cohort prevention trials. Mobility rates are an important substantive consideration in estimating the effects of an intervention. In cluster-randomized trials, mobility rates are often correlated with ethnicity, poverty and other variables associated with disparity. This raises the possibility that estimated intervention effects may generalize to only the least mobile segments of a population and, thus, create a threat to external validity. Such mobility can also create threats to the internal validity of conclusions from randomized trials. Researchers must decide how to deal with persons who leave study clusters during a trial (dropouts), persons and clusters that do not comply with an assigned intervention, and persons who enter clusters during a trial (late entrants), in addition to the persons who remain for the duration of a trial (stayers). Statistical techniques alone cannot solve the key issues of internal and external validity raised by the phenomenon of person mobility. This commentary presents a systematic, Campbellian-type analysis of person mobility in cluster-randomized cohort prevention trials. It describes four approaches for dealing with dropouts, late entrants and stayers with respect to data collection, analysis and generalizability. The questions at issue are: 1) From whom should data be collected at each wave of data collection? 2) Which cases should be included in the analyses of an intervention effect? and 3) To what populations can trial results be generalized? The conclusions lead to recommendations for the design and analysis of future cluster-randomized cohort prevention trials.

  5. Applying clustering to statistical analysis of student reasoning about two-dimensional kinematics

    Directory of Open Access Journals (Sweden)

    R. Padraic Springuel

    2007-12-01

    Full Text Available We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and written elements. The primary goal of this paper is to describe the methodology itself; we include a brief overview of relevant results.

  6. Cluster Analysis of the International Stellarator Confinement Database

    International Nuclear Information System (INIS)

    Kus, A.; Dinklage, A.; Preuss, R.; Ascasibar, E.; Harris, J. H.; Okamura, S.; Yamada, H.; Sano, F.; Stroth, U.; Talmadge, J.

    2008-01-01

    Heterogeneous structure of collected data is one of the problems that occur during derivation of scalings for energy confinement time, and whose analysis tourns out to be wide and complicated matter. The International Stellarator Confinement Database [1], shortly ISCDB, comprises in its latest version 21 a total of 3647 observations from 8 experimental devices, 2067 therefrom beeing so far completed for upcoming analyses. For confinement scaling studies 1933 observation were chosen as the standard dataset. Here we describe a statistical method of cluster analysis for identification of possible cohesive substructures in ISDCB and present some preliminary results

  7. Cluster and principal component analysis based on SSR markers of Amomum tsao-ko in Jinping County of Yunnan Province

    Science.gov (United States)

    Ma, Mengli; Lei, En; Meng, Hengling; Wang, Tiantao; Xie, Linyan; Shen, Dong; Xianwang, Zhou; Lu, Bingyue

    2017-08-01

    Amomum tsao-ko is a commercial plant that used for various purposes in medicinal and food industries. For the present investigation, 44 germplasm samples were collected from Jinping County of Yunnan Province. Clusters analysis and 2-dimensional principal component analysis (PCA) was used to represent the genetic relations among Amomum tsao-ko by using simple sequence repeat (SSR) markers. Clustering analysis clearly distinguished the samples groups. Two major clusters were formed; first (Cluster I) consisted of 34 individuals, the second (Cluster II) consisted of 10 individuals, Cluster I as the main group contained multiple sub-clusters. PCA also showed 2 groups: PCA Group 1 included 29 individuals, PCA Group 2 included 12 individuals, consistent with the results of cluster analysis. The purpose of the present investigation was to provide information on genetic relationship of Amomum tsao-ko germplasm resources in main producing areas, also provide a theoretical basis for the protection and utilization of Amomum tsao-ko resources.

  8. Characterizing Suicide in Toronto: An Observational Study and Cluster Analysis

    Science.gov (United States)

    Sinyor, Mark; Schaffer, Ayal; Streiner, David L

    2014-01-01

    Objective: To determine whether people who have died from suicide in a large epidemiologic sample form clusters based on demographic, clinical, and psychosocial factors. Method: We conducted a coroner’s chart review for 2886 people who died in Toronto, Ontario, from 1998 to 2010, and whose death was ruled as suicide by the Office of the Chief Coroner of Ontario. A cluster analysis using known suicide risk factors was performed to determine whether suicide deaths separate into distinct groups. Clusters were compared according to person- and suicide-specific factors. Results: Five clusters emerged. Cluster 1 had the highest proportion of females and nonviolent methods, and all had depression and a past suicide attempt. Cluster 2 had the highest proportion of people with a recent stressor and violent suicide methods, and all were married. Cluster 3 had mostly males between the ages of 20 and 64, and all had either experienced recent stressors, suffered from mental illness, or had a history of substance abuse. Cluster 4 had the youngest people and the highest proportion of deaths by jumping from height, few were married, and nearly one-half had bipolar disorder or schizophrenia. Cluster 5 had all unmarried people with no prior suicide attempts, and were the least likely to have an identified mental illness and most likely to leave a suicide note. Conclusions: People who die from suicide assort into different patterns of demographic, clinical, and death-specific characteristics. Identifying and studying subgroups of suicides may advance our understanding of the heterogeneous nature of suicide and help to inform development of more targeted suicide prevention strategies. PMID:24444321

  9. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

    Science.gov (United States)

    Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

    2017-09-27

    A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  10. Melodic pattern discovery by structural analysis via wavelets and clustering techniques

    DEFF Research Database (Denmark)

    Velarde, Gissel; Meredith, David

    We present an automatic method to support melodic pattern discovery by structural analysis of symbolic representations by means of wavelet analysis and clustering techniques. In previous work, we used the method to recognize the parent works of melodic segments, or to classify tunes into tune......-means to cluster melodic segments into groups of measured similarity and obtain a raking of the most prototypical melodic segments or patterns and their occurrences. We test the method on the JKU Patterns Development Database and evaluate it based on the ground truth defined by the MIREX 2013 Discovery of Repeated...... Themes & Sections task. We compare the results of our method to the output of geometric approaches. Finally, we discuss about the relevance of our wavelet-based analysis in relation to structure, pattern discovery, similarity and variation, and comment about the considerations of the method when used...

  11. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  12. WHY DO SOME NATIONS SUCCEED AND OTHERS FAIL IN INTERNATIONAL COMPETITION? FACTOR ANALYSIS AND CLUSTER ANALYSIS AT EUROPEAN LEVEL

    Directory of Open Access Journals (Sweden)

    Popa Ion

    2015-07-01

    Full Text Available As stated by Michael Porter (1998: 57, 'this is perhaps the most frequently asked economic question of our times.' However, a widely accepted answer is still missing. The aim of this paper is not to provide the BIG answer for such a BIG question, but rather to provide a different perspective on the competitiveness at the national level. In this respect, we followed a two step procedure, called “tandem analysis”. (OECD, 2008. First we employed a Factor Analysis in order to reveal the underlying factors of the initial dataset followed by a Cluster Analysis which aims classifying the 35 countries according to the main characteristics of competitiveness resulting from Factor Analysis. The findings revealed that clustering the 35 states after the first two factors: Smart Growth and Market Development, which recovers almost 76% of common variability of the twelve original variables, are highlighted four clusters as well as a series of useful information in order to analyze the characteristics of the four clusters and discussions on them.

  13. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  14. Cardiovascular reactivity patterns and pathways to hypertension: a multivariate cluster analysis

    NARCIS (Netherlands)

    Brindle, R. C.; Ginty, A. T.; Jones, A.; Phillips, A. C.; Roseboom, T. J.; Carroll, D.; Painter, R. C.; de Rooij, S. R.

    2016-01-01

    Substantial evidence links exaggerated mental stress induced blood pressure reactivity to future hypertension, but the results for heart rate reactivity are less clear. For this reason multivariate cluster analysis was carried out to examine the relationship between heart rate and blood pressure

  15. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  16. A Cluster Analysis of Personality Style in Adults with ADHD

    Science.gov (United States)

    Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita

    2008-01-01

    Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…

  17. Analysis of candidates for interacting galaxy clusters. I. A1204 and A2029/A2033

    Science.gov (United States)

    Gonzalez, Elizabeth Johana; de los Rios, Martín; Oio, Gabriel A.; Lang, Daniel Hernández; Tagliaferro, Tania Aguirre; Domínguez R., Mariano J.; Castellón, José Luis Nilo; Cuevas L., Héctor; Valotto, Carlos A.

    2018-04-01

    Context. Merging galaxy clusters allow for the study of different mass components, dark and baryonic, separately. Also, their occurrence enables to test the ΛCDM scenario, which can be used to put constraints on the self-interacting cross-section of the dark-matter particle. Aim. It is necessary to perform a homogeneous analysis of these systems. Hence, based on a recently presented sample of candidates for interacting galaxy clusters, we present the analysis of two of these cataloged systems. Methods: In this work, the first of a series devoted to characterizing galaxy clusters in merger processes, we perform a weak lensing analysis of clusters A1204 and A2029/A2033 to derive the total masses of each identified interacting structure together with a dynamical study based on a two-body model. We also describe the gas and the mass distributions in the field through a lensing and an X-ray analysis. This is the first of a series of works which will analyze these type of system in order to characterize them. Results: Neither merging cluster candidate shows evidence of having had a recent merger event. Nevertheless, there is dynamical evidence that these systems could be interacting or could interact in the future. Conclusions: It is necessary to include more constraints in order to improve the methodology of classifying merging galaxy clusters. Characterization of these clusters is important in order to properly understand the nature of these systems and their connection with dynamical studies.

  18. An analysis of hospital brand mark clusters.

    Science.gov (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  19. A Novel Double Cluster and Principal Component Analysis-Based Optimization Method for the Orbit Design of Earth Observation Satellites

    Directory of Open Access Journals (Sweden)

    Yunfeng Dong

    2017-01-01

    Full Text Available The weighted sum and genetic algorithm-based hybrid method (WSGA-based HM, which has been applied to multiobjective orbit optimizations, is negatively influenced by human factors through the artificial choice of the weight coefficients in weighted sum method and the slow convergence of GA. To address these two problems, a cluster and principal component analysis-based optimization method (CPC-based OM is proposed, in which many candidate orbits are gradually randomly generated until the optimal orbit is obtained using a data mining method, that is, cluster analysis based on principal components. Then, the second cluster analysis of the orbital elements is introduced into CPC-based OM to improve the convergence, developing a novel double cluster and principal component analysis-based optimization method (DCPC-based OM. In DCPC-based OM, the cluster analysis based on principal components has the advantage of reducing the human influences, and the cluster analysis based on six orbital elements can reduce the search space to effectively accelerate convergence. The test results from a multiobjective numerical benchmark function and the orbit design results of an Earth observation satellite show that DCPC-based OM converges more efficiently than WSGA-based HM. And DCPC-based OM, to some degree, reduces the influence of human factors presented in WSGA-based HM.

  20. A SURVEY ON DOCUMENT CLUSTERING APPROACH FOR COMPUTER FORENSIC ANALYSIS

    OpenAIRE

    Monika Raghuvanshi*, Rahul Patel

    2016-01-01

    In a forensic analysis, large numbers of files are examined. Much of the information comprises of in unstructured format, so it’s quite difficult task for computer forensic to perform such analysis. That’s why to do the forensic analysis of document within a limited period of time require a special approach such as document clustering. This paper review different document clustering algorithms methodologies for example K-mean, K-medoid, single link, complete link, average link in accorandance...

  1. Application of clustering methods: Regularized Markov clustering (R-MCL) for analyzing dengue virus similarity

    Science.gov (United States)

    Lestari, D.; Raharjo, D.; Bustamam, A.; Abdillah, B.; Widhianto, W.

    2017-07-01

    Dengue virus consists of 10 different constituent proteins and are classified into 4 major serotypes (DEN 1 - DEN 4). This study was designed to perform clustering against 30 protein sequences of dengue virus taken from Virus Pathogen Database and Analysis Resource (VIPR) using Regularized Markov Clustering (R-MCL) algorithm and then we analyze the result. By using Python program 3.4, R-MCL algorithm produces 8 clusters with more than one centroid in several clusters. The number of centroid shows the density level of interaction. Protein interactions that are connected in a tissue, form a complex protein that serves as a specific biological process unit. The analysis of result shows the R-MCL clustering produces clusters of dengue virus family based on the similarity role of their constituent protein, regardless of serotypes.

  2. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  3. 3D Building Models Segmentation Based on K-Means++ Cluster Analysis

    Science.gov (United States)

    Zhang, C.; Mao, B.

    2016-10-01

    3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  4. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  5. Approximate fuzzy C-means (AFCM) cluster analysis of medical magnetic resonance image (MRI) data

    International Nuclear Information System (INIS)

    DelaPaz, R.L.; Chang, P.J.; Bernstein, R.; Dave, J.V.

    1987-01-01

    The authors describe the application of an approximate fuzzy C-means (AFCM) clustering algorithm as a data dimension reduction approach to medical magnetic resonance images (MRI). Image data consisted of one T1-weighted, two T2-weighted, and one T2*-weighted (magnetic susceptibility) image for each cranial study and a matrix of 10 images generated from 10 combinations of TE and TR for each body lymphoma study. All images were obtained with a 1.5 Tesla imaging system (GE Signa). Analyses were performed on over 100 MR image sets with a variety of pathologies. The cluster analysis was operated in an unsupervised mode and computational overhead was minimized by utilizing a table look-up approach without adversely affecting accuracy. Image data were first segmented into 2 coarse clusters, each of which was then subdivided into 16 fine clusters. The final tissue classifications were presented as color-coded anatomically-mapped images and as two and three dimensional displays of cluster center data in selected feature space (minimum spanning tree). Fuzzy cluster analysis appears to be a clinically useful dimension reduction technique which results in improved diagnostic specificity of medical magnetic resonance images

  6. Planck 2013 results. XX. Cosmology from Sunyaev-Zeldovich cluster counts

    CERN Document Server

    Ade, P.A.R.; Armitage-Caplan, C.; Arnaud, M.; Ashdown, M.; Atrio-Barandela, F.; Aumont, J.; Baccigalupi, C.; Banday, A.J.; Barreiro, R.B.; Barrena, R.; Bartlett, J.G.; Battaner, E.; Battye, R.; Benabed, K.; Benoit, A.; Benoit-Levy, A.; Bernard, J.P.; Bersanelli, M.; Bielewicz, P.; Bikmaev, I.; Blanchard, A.; Bobin, J.; Bock, J.J.; Bohringer, H.; Bonaldi, A.; Bond, J.R.; Borrill, J.; Bouchet, F.R.; Bourdin, H.; Bridges, M.; Brown, M.L.; Bucher, M.; Burenin, R.; Burigana, C.; Butler, R.C.; Cardoso, J.F.; Carvalho, P.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.R.; Chiang, L.Y.; Chiang, H.C.; Chon, G.; Christensen, P.R.; Church, S.; Clements, D.L.; Colombi, S.; Colombo, L.P.L.; Couchot, F.; Coulais, A.; Crill, B.P.; Curto, A.; Cuttaia, F.; Da Silva, A.; Dahle, H.; Danese, L.; Davies, R.D.; Davis, R.J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Delouis, J.M.; Democles, J.; Desert, F.X.; Dickinson, C.; Diego, J.M.; Dolag, K.; Dole, H.; Donzelli, S.; Dore, O.; Douspis, M.; Dupac, X.; Efstathiou, G.; Ensslin, T.A.; Eriksen, H.K.; Finelli, F.; Flores-Cacho, I.; Forni, O.; Frailis, M.; Franceschi, E.; Fromenteau, S.; Galeotta, S.; Ganga, K.; Genova-Santos, R.T.; Giard, M.; Giardino, G.; Giraud-Heraud, Y.; Gonzalez-Nuevo, J.; Gorski, K.M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Hansen, F.K.; Hanson, D.; Harrison, D.; Henrot-Versille, S.; Hernandez-Monteagudo, C.; Herranz, D.; Hildebrandt, S.R.; Hivon, E.; Hobson, M.; Holmes, W.A.; Hornstrup, A.; Hovest, W.; Huffenberger, K.M.; Hurier, G.; Jaffe, T.R.; Jaffe, A.H.; Jones, W.C.; Juvela, M.; Keihanen, E.; Keskitalo, R.; Khamitov, I.; Kisner, T.S.; Kneissl, R.; Knoche, J.; Knox, L.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lahteenmaki, A.; Lamarre, J.M.; Lasenby, A.; Laureijs, R.J.; Lawrence, C.R.; Leahy, J.P.; Leonardi, R.; Leon-Tavares, J.; Lesgourgues, J.; Liddle, A.; Liguori, M.; Lilje, P.B.; Linden-Vornle, M.; Lopez-Caniego, M.; Lubin, P.M.; Macias-Perez, J.F.; Maffei, B.; Maino, D.; Mandolesi, N.; Marcos-Caballero, A.; Maris, M.; Marshall, D.J.; Martin, P.G.; Martinez-Gonzalez, E.; Masi, S.; Matarrese, S.; Matthai, F.; Mazzotta, P.; Meinhold, P.R.; Melchiorri, A.; Melin, J.B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschenes, M.A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C.B.; Norgaard-Nielsen, H.U.; Noviello, F.; Novikov, D.; Novikov, I.; Osborne, S.; Oxborrow, C.A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Perdereau, O.; Perotto, L.; Perrotta, F.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Popa, L.; Poutanen, T.; Pratt, G.W.; Prezeau, G.; Prunet, S.; Puget, J.L.; Rachen, J.P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Ricciardi, S.; Riller, T.; Ristorcelli, I.; Rocha, G.; Roman, M.; Rosset, C.; Roudier, G.; Rowan-Robinson, M.; Rubino-Martin, J.A.; Rusholme, B.; Sandri, M.; Santos, D.; Savini, G.; Scott, D.; Seiffert, M.D.; Shellard, E.P.S.; Spencer, L.D.; Starck, J.L.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sureau, F.; Sutton, D.; Suur-Uski, A.S.; Sygnet, J.F.; Tauber, J.A.; Tavagnacco, D.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Turler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Vittorio, N.; Wade, L.A.; Wandelt, B.D.; Weller, J.; White, M.; White, S.D.M.; Yvon, D.; Zacchei, A.; Zonca, A.

    2014-01-01

    We present constraints on cosmological parameters using number counts as a function of redshift for a sub-sample of 189 galaxy clusters from the Planck SZ (PSZ) catalogue. The PSZ is selected through the signature of the Sunyaev--Zeldovich (SZ) effect, and the sub-sample used here has a signal-to-noise threshold of seven, with each object confirmed as a cluster and all but one with a redshift estimate. We discuss the completeness of the sample and our construction of a likelihood analysis. Using a relation between mass $M$ and SZ signal $Y$ calibrated to X-ray measurements, we derive constraints on the power spectrum amplitude $\\sigma_8$ and matter density parameter $\\Omega_{\\mathrm{m}}$ in a flat $\\Lambda$CDM model. We test the robustness of our estimates and find that possible biases in the $Y$--$M$ relation and the halo mass function are larger than the statistical uncertainties from the cluster sample. Assuming the X-ray determined mass to be biased low relative to the true mass by between zero and 30%, m...

  7. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  8. Planck intermediate results: VIII. Filaments between interacting clusters

    DEFF Research Database (Denmark)

    Bartlett, J.G.; Cardoso, J.-F.; Castex, G.

    2013-01-01

    of a fraction of these missing baryons between pairs of galaxy clusters. Methods. Cluster pairs are good candidates for searching for the hotter and denser phase of the intergalactic medium (which is more easily observed through the SZ effect). Using an X-ray catalogue of clusters and the Planck data, we...

  9. Clusters of Insomnia Disorder: An Exploratory Cluster Analysis of Objective Sleep Parameters Reveals Differences in Neurocognitive Functioning, Quantitative EEG, and Heart Rate Variability.

    Science.gov (United States)

    Miller, Christopher B; Bartlett, Delwyn J; Mullins, Anna E; Dodds, Kirsty L; Gordon, Christopher J; Kyle, Simon D; Kim, Jong Won; D'Rozario, Angela L; Lee, Rico S C; Comas, Maria; Marshall, Nathaniel S; Yee, Brendon J; Espie, Colin A; Grunstein, Ronald R

    2016-11-01

    To empirically derive and evaluate potential clusters of Insomnia Disorder through cluster analysis from polysomnography (PSG). We hypothesized that clusters would differ on neurocognitive performance, sleep-onset measures of quantitative ( q )-EEG and heart rate variability (HRV). Research volunteers with Insomnia Disorder (DSM-5) completed a neurocognitive assessment and overnight PSG measures of total sleep time (TST), wake time after sleep onset (WASO), and sleep onset latency (SOL) were used to determine clusters. From 96 volunteers with Insomnia Disorder, cluster analysis derived at least two clusters from objective sleep parameters: Insomnia with normal objective sleep duration (I-NSD: n = 53) and Insomnia with short sleep duration (I-SSD: n = 43). At sleep onset, differences in HRV between I-NSD and I-SSD clusters suggest attenuated parasympathetic activity in I-SSD (P insomnia clusters derived from cluster analysis differ in sleep onset HRV. Preliminary data suggest evidence for three clusters in insomnia with differences for sustained attention and sleep-onset q -EEG. Insomnia 100 sleep study: Australia New Zealand Clinical Trials Registry (ANZCTR) identification number 12612000049875. URL: https://www.anzctr.org.au/Trial/Registration/TrialReview.aspx?id=347742. © 2016 Associated Professional Sleep Societies, LLC.

  10. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

    Directory of Open Access Journals (Sweden)

    Chao Zhang

    2017-09-01

    Full Text Available A wireless-powered sensor network (WPSN consisting of one hybrid access point (HAP, a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  11. Assessment of genetic divergence in tomato through agglomerative hierarchical clustering and principal component analysis

    International Nuclear Information System (INIS)

    Iqbal, Q.; Saleem, M.Y.; Hameed, A.; Asghar, M.

    2014-01-01

    For the improvement of qualitative and quantitative traits, existence of variability has prime importance in plant breeding. Data on different morphological and reproductive traits of 47 tomato genotypes were analyzed for correlation,agglomerative hierarchical clustering and principal component analysis (PCA) to select genotypes and traits for future breeding program. Correlation analysis revealed significant positive association between yield and yield components like fruit diameter, single fruit weight and number of fruits plant-1. Principal component (PC) analysis depicted first three PCs with Eigen-value higher than 1 contributing 81.72% of total variability for different traits. The PC-I showed positive factor loadings for all the traits except number of fruits plant-1. The contribution of single fruit weight and fruit diameter was highest in PC-1. Cluster analysis grouped all genotypes into five divergent clusters. The genotypes in cluster-II and cluster-V exhibited uniform maturity and higher yield. The D2 statistics confirmed highest distance between cluster- III and cluster-V while maximum similarity was observed in cluster-II and cluster-III. It is therefore suggested that crosses between genotypes of cluster-II and cluster-V with those of cluster-I and cluster-III may exhibit heterosis in F1 for hybrid breeding and for selection of superior genotypes in succeeding generations for cross breeding programme. (author)

  12. Data clustering theory, algorithms, and applications

    CERN Document Server

    Gan, Guojun; Wu, Jianhong

    2007-01-01

    Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.

  13. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  14. A comparison of three clustering methods for finding subgroups in MRI, SMS or clinical data: SPSS TwoStep Cluster analysis, Latent Gold and SNOB.

    Science.gov (United States)

    Kent, Peter; Jensen, Rikke K; Kongsted, Alice

    2014-10-02

    There are various methodological approaches to identifying clinically important subgroups and one method is to identify clusters of characteristics that differentiate people in cross-sectional and/or longitudinal data using Cluster Analysis (CA) or Latent Class Analysis (LCA). There is a scarcity of head-to-head comparisons that can inform the choice of which clustering method might be suitable for particular clinical datasets and research questions. Therefore, the aim of this study was to perform a head-to-head comparison of three commonly available methods (SPSS TwoStep CA, Latent Gold LCA and SNOB LCA). The performance of these three methods was compared: (i) quantitatively using the number of subgroups detected, the classification probability of individuals into subgroups, the reproducibility of results, and (ii) qualitatively using subjective judgments about each program's ease of use and interpretability of the presentation of results.We analysed five real datasets of varying complexity in a secondary analysis of data from other research projects. Three datasets contained only MRI findings (n = 2,060 to 20,810 vertebral disc levels), one dataset contained only pain intensity data collected for 52 weeks by text (SMS) messaging (n = 1,121 people), and the last dataset contained a range of clinical variables measured in low back pain patients (n = 543 people). Four artificial datasets (n = 1,000 each) containing subgroups of varying complexity were also analysed testing the ability of these clustering methods to detect subgroups and correctly classify individuals when subgroup membership was known. The results from the real clinical datasets indicated that the number of subgroups detected varied, the certainty of classifying individuals into those subgroups varied, the findings had perfect reproducibility, some programs were easier to use and the interpretability of the presentation of their findings also varied. The results from the artificial datasets

  15. Comparative analysis on the selection of number of clusters in community detection

    Science.gov (United States)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  16. Cluster analysis of obesity and asthma phenotypes.

    Directory of Open Access Journals (Sweden)

    E Rand Sutherland

    Full Text Available Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype.In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasoneObesity is an important determinant of asthma phenotype in adults. There is heterogeneity in expression of clinical and inflammatory biomarkers of asthma across obese individuals

  17. Using the Cluster Analysis and the Principal Component Analysis in Evaluating the Quality of a Destination

    Directory of Open Access Journals (Sweden)

    Ida Vajčnerová

    2016-01-01

    Full Text Available The objective of the paper is to explore possibilities of evaluating the quality of a tourist destination by means of the principal components analysis (PCA and the cluster analysis. In the paper both types of analysis are compared on the basis of the results they provide. The aim is to identify advantage and limits of both methods and provide methodological suggestion for their further use in the tourism research. The analyses is based on the primary data from the customers’ satisfaction survey with the key quality factors of a destination. As output of the two statistical methods is creation of groups or cluster of quality factors that are similar in terms of respondents’ evaluations, in order to facilitate the evaluation of the quality of tourist destinations. Results shows the possibility to use both tested methods. The paper is elaborated in the frame of wider research project aimed to develop a methodology for the quality evaluation of tourist destinations, especially in the context of customer satisfaction and loyalty.

  18. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  19. Partitional clustering algorithms

    CERN Document Server

    2015-01-01

    This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...

  20. Application of clustering analysis in the prediction of photovoltaic power generation based on neural network

    Science.gov (United States)

    Cheng, K.; Guo, L. M.; Wang, Y. K.; Zafar, M. T.

    2017-11-01

    In order to select effective samples in the large number of data of PV power generation years and improve the accuracy of PV power generation forecasting model, this paper studies the application of clustering analysis in this field and establishes forecasting model based on neural network. Based on three different types of weather on sunny, cloudy and rainy days, this research screens samples of historical data by the clustering analysis method. After screening, it establishes BP neural network prediction models using screened data as training data. Then, compare the six types of photovoltaic power generation prediction models before and after the data screening. Results show that the prediction model combining with clustering analysis and BP neural networks is an effective method to improve the precision of photovoltaic power generation.

  1. A cluster analysis on road traffic accidents using genetic algorithms

    Science.gov (United States)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  2. Identifying novel phenotypes of acute heart failure using cluster analysis of clinical variables.

    Science.gov (United States)

    Horiuchi, Yu; Tanimoto, Shuzou; Latif, A H M Mahbub; Urayama, Kevin Y; Aoki, Jiro; Yahagi, Kazuyuki; Okuno, Taishi; Sato, Yu; Tanaka, Tetsu; Koseki, Keita; Komiyama, Kota; Nakajima, Hiroyoshi; Hara, Kazuhiro; Tanabe, Kengo

    2018-07-01

    Acute heart failure (AHF) is a heterogeneous disease caused by various cardiovascular (CV) pathophysiology and multiple non-CV comorbidities. We aimed to identify clinically important subgroups to improve our understanding of the pathophysiology of AHF and inform clinical decision-making. We evaluated detailed clinical data of 345 consecutive AHF patients using non-hierarchical cluster analysis of 77 variables, including age, sex, HF etiology, comorbidities, physical findings, laboratory data, electrocardiogram, echocardiogram and treatment during hospitalization. Cox proportional hazards regression analysis was performed to estimate the association between the clusters and clinical outcomes. Three clusters were identified. Cluster 1 (n=108) represented "vascular failure". This cluster had the highest average systolic blood pressure at admission and lung congestion with type 2 respiratory failure. Cluster 2 (n=89) represented "cardiac and renal failure". They had the lowest ejection fraction (EF) and worst renal function. Cluster 3 (n=148) comprised mostly older patients and had the highest prevalence of atrial fibrillation and preserved EF. Death or HF hospitalization within 12-month occurred in 23% of Cluster 1, 36% of Cluster 2 and 36% of Cluster 3 (p=0.034). Compared with Cluster 1, risk of death or HF hospitalization was 1.74 (95% CI, 1.03-2.95, p=0.037) for Cluster 2 and 1.82 (95% CI, 1.13-2.93, p=0.014) for Cluster 3. Cluster analysis may be effective in producing clinically relevant categories of AHF, and may suggest underlying pathophysiology and potential utility in predicting clinical outcomes. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  4. Molecular-dynamics analysis of mobile helium cluster reactions near surfaces of plasma-exposed tungsten

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Lin; Maroudas, Dimitrios, E-mail: maroudas@ecs.umass.edu [Department of Chemical Engineering, University of Massachusetts, Amherst, Massachusetts 01003-9303 (United States); Hammond, Karl D. [Department of Chemical Engineering, University of Missouri, Columbia, Missouri 65211 (United States); Wirth, Brian D. [Department of Nuclear Engineering, University of Tennessee, Knoxville, Tennessee 37996 (United States)

    2015-10-28

    We report the results of a systematic atomic-scale analysis of the reactions of small mobile helium clusters (He{sub n}, 4 ≤ n ≤ 7) near low-Miller-index tungsten (W) surfaces, aiming at a fundamental understanding of the near-surface dynamics of helium-carrying species in plasma-exposed tungsten. These small mobile helium clusters are attracted to the surface and migrate to the surface by Fickian diffusion and drift due to the thermodynamic driving force for surface segregation. As the clusters migrate toward the surface, trap mutation (TM) and cluster dissociation reactions are activated at rates higher than in the bulk. TM produces W adatoms and immobile complexes of helium clusters surrounding W vacancies located within the lattice planes at a short distance from the surface. These reactions are identified and characterized in detail based on the analysis of a large number of molecular-dynamics trajectories for each such mobile cluster near W(100), W(110), and W(111) surfaces. TM is found to be the dominant cluster reaction for all cluster and surface combinations, except for the He{sub 4} and He{sub 5} clusters near W(100) where cluster partial dissociation following TM dominates. We find that there exists a critical cluster size, n = 4 near W(100) and W(111) and n = 5 near W(110), beyond which the formation of multiple W adatoms and vacancies in the TM reactions is observed. The identified cluster reactions are responsible for important structural, morphological, and compositional features in the plasma-exposed tungsten, including surface adatom populations, near-surface immobile helium-vacancy complexes, and retained helium content, which are expected to influence the amount of hydrogen re-cycling and tritium retention in fusion tokamaks.

  5. Cluster analysis of radionuclide concentrations in beach sand

    NARCIS (Netherlands)

    de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.

    This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of

  6. Cluster analysis of polymers using laser-induced breakdown spectroscopy with K-means

    Science.gov (United States)

    Yangmin, GUO; Yun, TANG; Yu, DU; Shisong, TANG; Lianbo, GUO; Xiangyou, LI; Yongfeng, LU; Xiaoyan, ZENG

    2018-06-01

    Laser-induced breakdown spectroscopy (LIBS) combined with K-means algorithm was employed to automatically differentiate industrial polymers under atmospheric conditions. The unsupervised learning algorithm K-means were utilized for the clustering of LIBS dataset measured from twenty kinds of industrial polymers. To prevent the interference from metallic elements, three atomic emission lines (C I 247.86 nm , H I 656.3 nm, and O I 777.3 nm) and one molecular line C–N (0, 0) 388.3 nm were used. The cluster analysis results were obtained through an iterative process. The Davies–Bouldin index was employed to determine the initial number of clusters. The average relative standard deviation values of characteristic spectral lines were used as the iterative criterion. With the proposed approach, the classification accuracy for twenty kinds of industrial polymers achieved 99.6%. The results demonstrated that this approach has great potential for industrial polymers recycling by LIBS.

  7. Genome-scale cluster analysis of replicated microarrays using shrinkage correlation coefficient.

    Science.gov (United States)

    Yao, Jianchao; Chang, Chunqi; Salmi, Mari L; Hung, Yeung Sam; Loraine, Ann; Roux, Stanley J

    2008-06-18

    Currently, clustering with some form of correlation coefficient as the gene similarity metric has become a popular method for profiling genomic data. The Pearson correlation coefficient and the standard deviation (SD)-weighted correlation coefficient are the two most widely-used correlations as the similarity metrics in clustering microarray data. However, these two correlations are not optimal for analyzing replicated microarray data generated by most laboratories. An effective correlation coefficient is needed to provide statistically sufficient analysis of replicated microarray data. In this study, we describe a novel correlation coefficient, shrinkage correlation coefficient (SCC), that fully exploits the similarity between the replicated microarray experimental samples. The methodology considers both the number of replicates and the variance within each experimental group in clustering expression data, and provides a robust statistical estimation of the error of replicated microarray data. The value of SCC is revealed by its comparison with two other correlation coefficients that are currently the most widely-used (Pearson correlation coefficient and SD-weighted correlation coefficient) using statistical measures on both synthetic expression data as well as real gene expression data from Saccharomyces cerevisiae. Two leading clustering methods, hierarchical and k-means clustering were applied for the comparison. The comparison indicated that using SCC achieves better clustering performance. Applying SCC-based hierarchical clustering to the replicated microarray data obtained from germinating spores of the fern Ceratopteris richardii, we discovered two clusters of genes with shared expression patterns during spore germination. Functional analysis suggested that some of the genetic mechanisms that control germination in such diverse plant lineages as mosses and angiosperms are also conserved among ferns. This study shows that SCC is an alternative to the Pearson

  8. Interactive visual exploration and refinement of cluster assignments.

    Science.gov (United States)

    Kern, Michael; Lex, Alexander; Gehlenborg, Nils; Johnson, Chris R

    2017-09-12

    With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data. In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes. Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.

  9. Vector Nonlinear Time-Series Analysis of Gamma-Ray Burst Datasets on Heterogeneous Clusters

    Directory of Open Access Journals (Sweden)

    Ioana Banicescu

    2005-01-01

    Full Text Available The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.

  10. Planck early results. X. Statistical analysis of Sunyaev-Zeldovich scaling relations for X-ray galaxy clusters

    DEFF Research Database (Denmark)

    Poutanen, T.; Natoli, P.; Polenta, G.

    2011-01-01

    All-sky data from the Planck survey and the Meta-Catalogue of X-ray detected Clusters of galaxies (MCXC) are combined to investigate the relationship between the thermal Sunyaev-Zeldovich (SZ) signal and X-ray luminosity. The sample comprises ~1600 X-ray clusters with redshifts up to ~1 and spans...

  11. Phenotype Clustering of Breast Epithelial Cells in Confocal Imagesbased on Nuclear Protein Distribution Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Long, Fuhui; Peng, Hanchuan; Sudar, Damir; Levievre, Sophie A.; Knowles, David W.

    2006-09-05

    Background: The distribution of the chromatin-associatedproteins plays a key role in directing nuclear function. Previously, wedeveloped an image-based method to quantify the nuclear distributions ofproteins and showed that these distributions depended on the phenotype ofhuman mammary epithelial cells. Here we describe a method that creates ahierarchical tree of the given cell phenotypes and calculates thestatistical significance between them, based on the clustering analysisof nuclear protein distributions. Results: Nuclear distributions ofnuclear mitotic apparatus protein were previously obtained fornon-neoplastic S1 and malignant T4-2 human mammary epithelial cellscultured for up to 12 days. Cell phenotype was defined as S1 or T4-2 andthe number of days in cultured. A probabilistic ensemble approach wasused to define a set of consensus clusters from the results of multipletraditional cluster analysis techniques applied to the nucleardistribution data. Cluster histograms were constructed to show how cellsin any one phenotype were distributed across the consensus clusters.Grouping various phenotypes allowed us to build phenotype trees andcalculate the statistical difference between each group. The resultsshowed that non-neoplastic S1 cells could be distinguished from malignantT4-2 cells with 94.19 percent accuracy; that proliferating S1 cells couldbe distinguished from differentiated S1 cells with 92.86 percentaccuracy; and showed no significant difference between the variousphenotypes of T4-2 cells corresponding to increasing tumor sizes.Conclusion: This work presents a cluster analysis method that canidentify significant cell phenotypes, based on the nuclear distributionof specific proteins, with high accuracy.

  12. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  13. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  14. Clustering of near clusters versus cluster compactness

    International Nuclear Information System (INIS)

    Yu Gao; Yipeng Jing

    1989-01-01

    The clustering properties of near Zwicky clusters are studied by using the two-point angular correlation function. The angular correlation functions for compact and medium compact clusters, for open clusters, and for all near Zwicky clusters are estimated. The results show much stronger clustering for compact and medium compact clusters than for open clusters, and that open clusters have nearly the same clustering strength as galaxies. A detailed study of the compactness-dependence of correlation function strength is worth investigating. (author)

  15. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  16. Accident patterns for construction-related workers: a cluster analysis

    Science.gov (United States)

    Liao, Chia-Wen; Tyan, Yaw-Yauan

    2012-01-01

    The construction industry has been identified as one of the most hazardous industries. The risk of constructionrelated workers is far greater than that in a manufacturing based industry. However, some steps can be taken to reduce worker risk through effective injury prevention strategies. In this article, k-means clustering methodology is employed in specifying the factors related to different worker types and in identifying the patterns of industrial occupational accidents. Accident reports during the period 1998 to 2008 are extracted from case reports of the Northern Region Inspection Office of the Council of Labor Affairs of Taiwan. The results show that the cluster analysis can indicate some patterns of occupational injuries in the construction industry. Inspection plans should be proposed according to the type of construction-related workers. The findings provide a direction for more effective inspection strategies and injury prevention programs.

  17. Cluster analysis for portfolio optimization

    OpenAIRE

    Vincenzo Tola; Fabrizio Lillo; Mauro Gallegati; Rosario N. Mantegna

    2005-01-01

    We consider the problem of the statistical uncertainty of the correlation matrix in the optimization of a financial portfolio. We show that the use of clustering algorithms can improve the reliability of the portfolio in terms of the ratio between predicted and realized risk. Bootstrap analysis indicates that this improvement is obtained in a wide range of the parameters N (number of assets) and T (investment horizon). The predicted and realized risk level and the relative portfolio compositi...

  18. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  19. Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms.

    Science.gov (United States)

    Esplin, M Sean; Manuck, Tracy A; Varner, Michael W; Christensen, Bryce; Biggio, Joseph; Bukowski, Radek; Parry, Samuel; Zhang, Heping; Huang, Hao; Andrews, William; Saade, George; Sadovsky, Yoel; Reddy, Uma M; Ilekis, John

    2015-09-01

    We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators' ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB. We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks' gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software. One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ(2) analysis (PD and DH, P cluster 3 of SPTB. We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. A critical cluster analysis of 44 indicators of author-level performance

    DEFF Research Database (Denmark)

    Wildgaard, Lorna Elizabeth

    2016-01-01

    -four indicators of individual researcher performance were computed using the data. The clustering solution was supported by continued reference to the researcher’s curriculum vitae, an effect analysis and a risk analysis. Disciplinary appropriate indicators were identified and used to divide the researchers......This paper explores a 7-stage cluster methodology as a process to identify appropriate indicators for evaluation of individual researchers at a disciplinary and seniority level. Publication and citation data for 741 researchers from 4 disciplines was collected in Web of Science. Forty...... of statistics in research evaluation. The strength of the 7-stage cluster methodology is that it makes clear that in the evaluation of individual researchers, statistics cannot stand alone. The methodology is reliant on contextual information to verify the bibliometric values and cluster solution...

  1. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    Science.gov (United States)

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  2. The composite sequential clustering technique for analysis of multispectral scanner data

    Science.gov (United States)

    Su, M. Y.

    1972-01-01

    The clustering technique consists of two parts: (1) a sequential statistical clustering which is essentially a sequential variance analysis, and (2) a generalized K-means clustering. In this composite clustering technique, the output of (1) is a set of initial clusters which are input to (2) for further improvement by an iterative scheme. This unsupervised composite technique was employed for automatic classification of two sets of remote multispectral earth resource observations. The classification accuracy by the unsupervised technique is found to be comparable to that by traditional supervised maximum likelihood classification techniques. The mathematical algorithms for the composite sequential clustering program and a detailed computer program description with job setup are given.

  3. Cluster-based analysis of multi-model climate ensembles

    Science.gov (United States)

    Hyde, Richard; Hossaini, Ryan; Leeson, Amber A.

    2018-06-01

    Clustering - the automated grouping of similar data - can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model-observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry-climate model (CCM) output of tropospheric ozone - an important greenhouse gas - from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ˜ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ˜ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere - where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and

  4. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    Energy Technology Data Exchange (ETDEWEB)

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold (Hal) Edward; Stevenson, Joel O.; Benner, Robert E., Jr. (.,; .); Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  5. Support Policies in Clusters: Prioritization of Support Needs by Cluster Members According to Cluster Life Cycle

    Directory of Open Access Journals (Sweden)

    Gulcin Salıngan

    2012-07-01

    Full Text Available Economic development has always been a moving target. Both the national and local governments have been facing the challenge of implementing the effective and efficient economic policy and program in order to best utilize their limited resources. One of the recent approaches in this area is called cluster-based economic analysis and strategy development. This study reviews key literature and some of the cluster based economic policies adopted by different governments. Based on this review, it proposes “the cluster life cycle” as a determining factor to identify the support requirements of clusters. A survey, designed based on literature review of International Cluster support programs, was conducted with 30 participants from 3 clusters with different maturity stage. This paper discusses the results of this study conducted among the cluster members in Eskişehir- Bilecik-Kütahya Region in Turkey on the requirement of the support to foster the development of related clusters.

  6. Descriptive analysis of factors that influence economical results in the furniture cluster of Bento Gonçalves

    Directory of Open Access Journals (Sweden)

    Miguel Afonso Sellitto

    2014-12-01

    Full Text Available The purpose of this article is to analyze factors that can influence the competitiveness of companies in the furniture cluster of Bento Gonçalves, Rio Grande do Sul. By a literature review, we identify four factors that can influence competition in clusters: the region's productivity, innovation, relationship with suppliers, and cooperation between companies. The research method is the single case study. The research techniques are the review of specific bibliographic and documentation of the studied cluster, and interviews with experts of the cluster. The main findings are: the cluster has high productivity, mainly by hi-tech machinery employed by the main companies; innovation is permanent and motivated by the imposition to medium and short companies of business goals by the main companies; the relationship with suppliers is problematic regarding the large-scale vendors by the lack of the practice of collective purchases in the area; and cooperation between enterprises is small, by the culture of the region that don´t appreciate depending on resources available outside the companies. Such factors can contribute to produce hypotheses for further research.

  7. A critical evaluation of the use of cluster analysis to identify contaminated sediments in the Ria de Vigo

    Energy Technology Data Exchange (ETDEWEB)

    Rubio, B; Nombela, M. A; Vilas, F [Departamento de Geociencias Marinas y Ordenacion del Territorio, Vigo, Espana (Spain)

    2001-06-01

    The indiscriminate use of cluster analysis to distinguish contaminated and non-contaminated sediments has led us to make a comparative evaluation of different cluster analysis procedures as applied to heavy metal concentrations in subtidal sediments from the Ria de Vigo, NW Spain. The use of different clusters algorithms and other transformations from the same departing set of data lead to the formation of different clusters with a clear inconclusive result about the contamination status of the sediments. The results show that this approach is better suited to identifying groups of samples differing in sedimentological characteristics, such as grain size, rather than in the degree of contamination. Our main aim is to call attention to these aspects in cluster analysis and to suggest that researches should be rigorous with this kind of analysis. Finally, the use of discriminate analysis allows us to find a discriminate function that separates the samples into two clearly differentiated groups, which should not be treated jointly. [Spanish] El uso indiscriminado del analisis cluster para distinguir sedimentos contaminados y no contaminados nos ha llevado a realizar una evaluacion comparativa entre los diferentes procedimientos de estos analisis aplicada a la concentracion de metales pesados en sedimentos submareales de la Ria de Vigo, NW de Espana. La utilizacion de distintos algoritmos de cluster, asi como otras transformaciones de la misma matriz de datos conduce a la formacion de diferentes clusters con un resultado inconcluso sobre el estado de contaminacion de los sedimentos. Los resultados muestran que esta aproximacion se ajusta mejor para identificar grupos de muestras que difieren en caracteristicas sedimentologicas, tal como el tamano de grano, mas que el grado de contaminacion. El principal objetivo es llamar la atencion sobre estos aspectos del analisis cluster y sugerir a los investigadores que sean rigurosos con este tipo de analisis. Finalmente el uso

  8. Cluster analysis for the probability of DSB site induced by electron tracks

    Energy Technology Data Exchange (ETDEWEB)

    Yoshii, Y. [Biological Research, Education and Instrumentation Center, Sapporo Medical University, Sapporo 060-8556 (Japan); Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan); Sasaki, K. [Faculty of Health Sciences, Hokkaido University of Science, Sapporo 006-8585 (Japan); Matsuya, Y. [Graduate School of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan); Date, H., E-mail: date@hs.hokudai.ac.jp [Faculty of Health Sciences, Hokkaido University, Sapporo 060-0812 (Japan)

    2015-05-01

    To clarify the influence of bio-cells exposed to ionizing radiations, the densely populated pattern of the ionization in the cell nucleus is of importance because it governs the extent of DNA damage which may lead to cell lethality. In this study, we have conducted a cluster analysis of ionization and excitation events to estimate the number of double-strand breaks (DSBs) induced by electron tracks. A Monte Carlo simulation for electrons in liquid water was performed to determine the spatial location of the ionization and excitation events. The events were divided into clusters by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. The algorithm enables us to sort out the events into the groups (clusters) in which a minimum number of neighboring events are contained within a given radius. For evaluating the number of DSBs in the extracted clusters, we have introduced an aggregation index (AI). The computational results show that a sub-keV electron produces DSBs in a dense formation more effectively than higher energy electrons. The root-mean square radius (RMSR) of the cluster size is below 5 nm, which is smaller than the chromatin fiber thickness. It was found that this size of clustering events has a high possibility to cause lesions in DNA within the chromatin fiber site.

  9. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  10. Paternal age related schizophrenia (PARS): Latent subgroups detected by k-means clustering analysis.

    Science.gov (United States)

    Lee, Hyejoo; Malaspina, Dolores; Ahn, Hongshik; Perrin, Mary; Opler, Mark G; Kleinhaus, Karine; Harlap, Susan; Goetz, Raymond; Antonius, Daniel

    2011-05-01

    Paternal age related schizophrenia (PARS) has been proposed as a subgroup of schizophrenia with distinct etiology, pathophysiology and symptoms. This study uses a k-means clustering analysis approach to generate hypotheses about differences between PARS and other cases of schizophrenia. We studied PARS (operationally defined as not having any family history of schizophrenia among first and second-degree relatives and fathers' age at birth ≥ 35 years) in a series of schizophrenia cases recruited from a research unit. Data were available on demographic variables, symptoms (Positive and Negative Syndrome Scale; PANSS), cognitive tests (Wechsler Adult Intelligence Scale-Revised; WAIS-R) and olfaction (University of Pennsylvania Smell Identification Test; UPSIT). We conducted a series of k-means clustering analyses to identify clusters of cases containing high concentrations of PARS. Two analyses generated clusters with high concentrations of PARS cases. The first analysis (N=136; PARS=34) revealed a cluster containing 83% PARS cases, in which the patients showed a significant discrepancy between verbal and performance intelligence. The mean paternal and maternal ages were 41 and 33, respectively. The second analysis (N=123; PARS=30) revealed a cluster containing 71% PARS cases, of which 93% were females; the mean age of onset of psychosis, at 17.2, was significantly early. These results strengthen the evidence that PARS cases differ from other patients with schizophrenia. Hypothesis-generating findings suggest that features of PARS may include a discrepancy between verbal and performance intelligence, and in females, an early age of onset. These findings provide a rationale for separating these phenotypes from others in future clinical, genetic and pathophysiologic studies of schizophrenia and in considering responses to treatment. Copyright © 2011 Elsevier B.V. All rights reserved.

  11. CLUSTER ANALYSIS UNTUK MEMPREDIKSI TALENTA PEMAIN BASKET MENGGUNAKAN JARINGAN SARAF TIRUAN SELF ORGANIZING MAPS (SOM

    Directory of Open Access Journals (Sweden)

    Gregorius Satia Budhi

    2008-01-01

    Full Text Available Basketball World has grown rapidly as the time goes on. This is signed by many competition and game all over the world. With the result there are many basketball players with their different playing characteristics. Demand for a coach or scout to look for or search great players to make a solid team as a coach requirement. With this application, a coach or scout will be helped in analyzing in decision making. This application uses Self Organizing Maps algorithm (SOM for Cluster Analysis. The real NBA player data is used for competitive learning or training process and real player data from Indonesian or Petra Christian University Basketball Players is used for testing process. The NBA Player data is prepared through cleaning process and then is transformed into a form that can be processed by SOM Algorithm. After that, the data is clustered with the SOM algorithm. The result of that clusters is displayed into a form that is easy to view and analyze. This result can be saved into a text file. By using the output / result of this application, that are the clusters of NBA player, the user can see the statistics of each cluster. With these cluster statistics coach or scout can predict the statistic and the position of a testing player who is in the same cluster. This information can give a support for the coach or scout to make a decision. Abstract in Bahasa Indonesia : Dunia bola basket telah berkembang dengan pesat seiring dengan berjalannya waktu. Hal ini ditandai dengan munculnya berbagai macam dan jenis kompetisi dan pertandingan baik dunia maupun dalam negeri. Sehingga makin banyak dilahirkannya pemain berbakat dengan berbagai karakteristik permainan yang berbeda. Tuntutan bagi seorang pelatih/pemandu bakat, untuk dapat melihat secara jeli dalam memenuhi kebutuhan tim untuk membentuk tim yang solid. Dengan dibuatnya aplikasi ini, maka akan membantu proses analisis dan pengambilan keputusan bagi pelatih maupun pemandu bakat Aplikasi ini

  12. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  13. Semantic based cluster content discovery in description first clustering algorithm

    International Nuclear Information System (INIS)

    Khan, M.W.; Asif, H.M.S.

    2017-01-01

    In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase. (author)

  14. Cluster as a wave telescope – first results from the fluxgate magnetometer

    Directory of Open Access Journals (Sweden)

    K.-H. Glassmeier

    2001-09-01

    Full Text Available The four Cluster spacecraft provide an excellent opportunity to study spatial structures in the magnetosphere and adjacent regions. Propagating waves are amongst the interesting structures and for the first time, Cluster will allow one to measure the wave vector of low-frequency fluctuations in a space plasma. Based on a generalized minimum variance analysis wave vector estimates will be determined in the terrestrial magnetosheath and the near-Earth solar wind. The virtue and weakness of the wave telescope technique used is discussed in detail.Key words. Electromagnetics (wave propagation – Magnetospheric physics (MHD waves and instabilities; plasma waves and instabilities

  15. Residential patterns in older homeless adults: Results of a cluster analysis.

    Science.gov (United States)

    Lee, Christopher Thomas; Guzman, David; Ponath, Claudia; Tieu, Lina; Riley, Elise; Kushel, Margot

    2016-03-01

    Adults aged 50 and older make up half of individuals experiencing homelessness and have high rates of morbidity and mortality. They may have different life trajectories and reside in different environments than do younger homeless adults. Although the environmental risks associated with homelessness are substantial, the environments in which older homeless individuals live have not been well characterized. We classified living environments and identified associated factors in a sample of older homeless adults. From July 2013 to June 2014, we recruited a community-based sample of 350 homeless men and women aged fifty and older in Oakland, California. We administered structured interviews including assessments of health, history of homelessness, social support, and life course. Participants used a recall procedure to describe where they stayed in the prior six months. We performed cluster analysis to classify residential venues and used multinomial logistic regression to identify individual factors prior to the onset of homelessness as well as the duration of unstable housing associated with living in them. We generated four residential groups describing those who were unsheltered (n = 162), cohabited unstably with friends and family (n = 57), resided in multiple institutional settings (shelters, jails, transitional housing) (n = 88), or lived primarily in rental housing (recently homeless) (n = 43). Compared to those who were unsheltered, having social support when last stably housed was significantly associated with cohabiting and institution use. Cohabiters and renters were significantly more likely to be women and have experienced a shorter duration of homelessness. Cohabiters were significantly more likely than unsheltered participants to have experienced abuse prior to losing stable housing. Pre-homeless social support appears to protect against street homelessness while low levels of social support may increase the risk for becoming homeless immediately after

  16. cluML: A markup language for clustering and cluster validity assessment of microarray data.

    Science.gov (United States)

    Bolshakova, Nadia; Cunningham, Pádraig

    2005-01-01

    cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations.

  17. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra

    2017-06-01

    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  18. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  19. Clustering analysis for muon tomography data elaboration in the Muon Portal project

    Science.gov (United States)

    Bandieramonte, M.; Antonuccio-Delogu, V.; Becciani, U.; Costa, A.; La Rocca, P.; Massimino, P.; Petta, C.; Pistagna, C.; Riggi, F.; Riggi, S.; Sciacca, E.; Vitello, F.

    2015-05-01

    Clustering analysis is one of multivariate data analysis techniques which allows to gather statistical data units into groups, in order to minimize the logical distance within each group and to maximize the one between different groups. In these proceedings, the authors present a novel approach to the muontomography data analysis based on clustering algorithms. As a case study we present the Muon Portal project that aims to build and operate a dedicated particle detector for the inspection of harbor containers to hinder the smuggling of nuclear materials. Clustering techniques, working directly on scattering points, help to detect the presence of suspicious items inside the container, acting, as it will be shown, as a filter for a preliminary analysis of the data.

  20. Subtypes of autism by cluster analysis based on structural MRI data.

    Science.gov (United States)

    Hrdlicka, Michal; Dudova, Iva; Beranova, Irena; Lisy, Jiri; Belsan, Tomas; Neuwirth, Jiri; Komarek, Vladimir; Faladova, Ludvika; Havlovicova, Marketa; Sedlacek, Zdenek; Blatny, Marek; Urbanek, Tomas

    2005-05-01

    The aim of our study was to subcategorize Autistic Spectrum Disorders (ASD) using a multidisciplinary approach. Sixty four autistic patients (mean age 9.4+/-5.6 years) were entered into a cluster analysis. The clustering analysis was based on MRI data. The clusters obtained did not differ significantly in the overall severity of autistic symptomatology as measured by the total score on the Childhood Autism Rating Scale (CARS). The clusters could be characterized as showing significant differences: Cluster 1: showed the largest sizes of the genu and splenium of the corpus callosum (CC), the lowest pregnancy order and the lowest frequency of facial dysmorphic features. Cluster 2: showed the largest sizes of the amygdala and hippocampus (HPC), the least abnormal visual response on the CARS, the lowest frequency of epilepsy and the least frequent abnormal psychomotor development during the first year of life. Cluster 3: showed the largest sizes of the caput of the nucleus caudatus (NC), the smallest sizes of the HPC and facial dysmorphic features were always present. Cluster 4: showed the smallest sizes of the genu and splenium of the CC, as well as the amygdala, and caput of the NC, the most abnormal visual response on the CARS, the highest frequency of epilepsy, the highest pregnancy order, abnormal psychomotor development during the first year of life was always present and facial dysmorphic features were always present. This multidisciplinary approach seems to be a promising method for subtyping autism.

  1. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, ppeople living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  2. Cluster analysis as a method for determining size ranges for spinal implants: disc lumbar replacement prosthesis dimensions from magnetic resonance images.

    Science.gov (United States)

    Lei, Dang; Holder, Roger L; Smith, Francis W; Wardlaw, Douglas; Hukins, David W L

    2006-12-01

    Statistical analysis of clinical radiologic data. To develop an objective method for finding the number of sizes for a lumbar disc replacement. Cluster analysis is a well-established technique for sorting observations into clusters so that the "similarity level" is maximal if they belong to the same cluster and minimal otherwise. Magnetic resonance scans from 69 patients, with no abnormal discs, yielded 206 sagittal and transverse images of 206 discs (levels L3-L4-L5-S1). Anteroposterior and lateral dimensions were measured from vertebral margins on transverse images; disc heights were measured from sagittal images. Hierarchical cluster analysis was performed to determine the number of clusters followed by nonhierarchical (K-means) cluster analysis. Discriminant analysis was used to determine how well the clusters could be used to classify an observation. The most successful method of clustering the data involved the following parameters: anteroposterior dimension; lateral dimension (both were the mean of results from the superior and inferior margins of a vertebral body, measured on transverse images); and maximum disc height (from a midsagittal image). These were grouped into 7 clusters so that a discriminant analysis was capable of correctly classifying 97.1% of the observations. The mean and standard deviations for the parameter values in each cluster were determined. Cluster analysis has been successfully used to find the dimensions of the minimum number of prosthesis sizes required to replace L3-L4 to L5-S1 discs; the range of sizes would enable them to be used at higher lumbar levels in some patients.

  3. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  4. Comparison Analysis and Evaluation of Urban Competitiveness in Chinese Urban Clusters

    Directory of Open Access Journals (Sweden)

    Haixiang Guo

    2015-04-01

    Full Text Available With accelerating urbanization, urban competitiveness has become a worldwide academic focus. Previous studies always focused on economic factors but ignored social elements when measuring urban competitiveness. In this paper, a city was considered as a whole containing different units such as departments, individuals and economic activities, which interact with each other and affect its economic operation. Moreover, a city’s development was compared to an object’s movement, and the components were compared to different forces acting upon the object. With the analysis of the principle of object movement, this study has established a more scientific evaluation index system that involves 4 subsystems, 12 elements and 58 indexes. By using the TOPSIS method, the study has worked out the urban competitiveness of 141 cities from 28 Chinese urban clusters in 2009. According to the calculation results, these cities were divided into four levels: A, B, C, D. Furthermore, in order to analyze the competitiveness of cities and urban clusters, cities and urban clusters have been divided into four groups according to their distributive characteristics: the southeast, the northeast and Bohai Rim, the central region and the west. Suggestions and recommendations for each group are provided based on careful analysis.

  5. The effect of different position of grape clusters on the bearing shoot on production results of Cabernet Sauvignon clones

    Directory of Open Access Journals (Sweden)

    Čoloveić Ana

    2014-01-01

    Full Text Available In this paper the differences were examined between clones of Cabernet sauvignon (clones ISV-F-V5, ISV-F-V6 and R5, i.e. the difference between uvological properties of grape clusters and grape berries, based on the different positions on the bearing shoot. Tests were conducted at the experimental field of the Faculty of Agriculture 'Radmilovac'. Standard ampelographic methods were used in numerous analyses of grape yield, as well as uvological properties of clones. All data were statistically analyzed and processed by the method of two-factor analysis of variance with repeated measuring of one factor (height and Tukey HSD test. Analysis of variance showed no significant differences between clones. The best results were achieved with grape clusters positioned in the base of bearing shoot. The first positioned grape clusters on the bearing shoot had the highest share in the total grape yield, the highest amount of sugar, and the highest positioned grape clusters had higher content of total acids. The differences determined between examined clones were in regard to productivity and quality of grapes which reflected also on production value.

  6. Cluster Analysis on Longitudinal Data of Patients with Adult-Onset Asthma.

    Science.gov (United States)

    Ilmarinen, Pinja; Tuomisto, Leena E; Niemelä, Onni; Tommola, Minna; Haanpää, Jussi; Kankaanranta, Hannu

    Previous cluster analyses on asthma are based on cross-sectional data. To identify phenotypes of adult-onset asthma by using data from baseline (diagnostic) and 12-year follow-up visits. The Seinäjoki Adult Asthma Study is a 12-year follow-up study of patients with new-onset adult asthma. K-means cluster analysis was performed by using variables from baseline and follow-up visits on 171 patients to identify phenotypes. Five clusters were identified. Patients in cluster 1 (n = 38) were predominantly nonatopic males with moderate smoking history at baseline. At follow-up, 40% of these patients had developed persistent obstruction but the number of patients with uncontrolled asthma (5%) and rhinitis (10%) was the lowest. Cluster 2 (n = 19) was characterized by older men with heavy smoking history, poor lung function, and persistent obstruction at baseline. At follow-up, these patients were mostly uncontrolled (84%) despite daily use of inhaled corticosteroid (ICS) with add-on therapy. Cluster 3 (n = 50) consisted mostly of nonsmoking females with good lung function at diagnosis/follow-up and well-controlled/partially controlled asthma at follow-up. Cluster 4 (n = 25) had obese and symptomatic patients at baseline/follow-up. At follow-up, these patients had several comorbidities (40% psychiatric disease) and were treated daily with ICS and add-on therapy. Patients in cluster 5 (n = 39) were mostly atopic and had the earliest onset of asthma, the highest blood eosinophils, and FEV 1 reversibility at diagnosis. At follow-up, these patients used the lowest ICS dose but 56% were well controlled. Results can be used to predict outcomes of patients with adult-onset asthma and to aid in development of personalized therapy (NCT02733016 at ClinicalTrials.gov). Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Cluster decay analysis and related structure effects of fissionable ...

    Indian Academy of Sciences (India)

    2015-08-01

    Aug 1, 2015 ... Collective clusterization approach of dynamical cluster decay model (DCM) has been ... fusion–fission process resulting in the emission of symmetric and/or ... represents the relative separation distance between two fragments or clusters ... decay constant λ or decay half-life T1/2 is defined as λ = (ln 2/T1/2) ...

  8. Statistical Techniques Applied to Aerial Radiometric Surveys (STAARS): cluster analysis. National Uranium Resource Evaluation

    International Nuclear Information System (INIS)

    Pirkle, F.L.; Stablein, N.K.; Howell, J.A.; Wecksung, G.W.; Duran, B.S.

    1982-11-01

    One objective of the aerial radiometric surveys flown as part of the US Department of Energy's National Uranium Resource Evaluation (NURE) program was to ascertain the regional distribution of near-surface radioelement abundances. Some method for identifying groups of observations with similar radioelement values was therefore required. It is shown in this report that cluster analysis can identify such groups even when no a priori knowledge of the geology of an area exists. A method of convergent k-means cluster analysis coupled with a hierarchical cluster analysis is used to classify 6991 observations (three radiometric variables at each observation location) from the Precambrian rocks of the Copper Mountain, Wyoming, area. Another method, one that combines a principal components analysis with a convergent k-means analysis, is applied to the same data. These two methods are compared with a convergent k-means analysis that utilizes available geologic knowledge. All three methods identify four clusters. Three of the clusters represent background values for the Precambrian rocks of the area, and one represents outliers (anomalously high 214 Bi). A segmentation of the data corresponding to geologic reality as discovered by other methods has been achieved based solely on analysis of aerial radiometric data. The techniques employed are composites of classical clustering methods designed to handle the special problems presented by large data sets. 20 figures, 7 tables

  9. The association between mood state and chronobiological characteristics in bipolar I disorder: a naturalistic, variable cluster analysis-based study.

    Science.gov (United States)

    Gonzalez, Robert; Suppes, Trisha; Zeitzer, Jamie; McClung, Colleen; Tamminga, Carol; Tohen, Mauricio; Forero, Angelica; Dwivedi, Alok; Alvarado, Andres

    2018-02-19

    Multiple types of chronobiological disturbances have been reported in bipolar disorder, including characteristics associated with general activity levels, sleep, and rhythmicity. Previous studies have focused on examining the individual relationships between affective state and chronobiological characteristics. The aim of this study was to conduct a variable cluster analysis in order to ascertain how mood states are associated with chronobiological traits in bipolar I disorder (BDI). We hypothesized that manic symptomatology would be associated with disturbances of rhythm. Variable cluster analysis identified five chronobiological clusters in 105 BDI subjects. Cluster 1, comprising subjective sleep quality was associated with both mania and depression. Cluster 2, which comprised variables describing the degree of rhythmicity, was associated with mania. Significant associations between mood state and cluster analysis-identified chronobiological variables were noted. Disturbances of mood were associated with subjectively assessed sleep disturbances as opposed to objectively determined, actigraphy-based sleep variables. No associations with general activity variables were noted. Relationships between gender and medication classes in use and cluster analysis-identified chronobiological characteristics were noted. Exploratory analyses noted that medication class had a larger impact on these relationships than the number of psychiatric medications in use. In a BDI sample, variable cluster analysis was able to group related chronobiological variables. The results support our primary hypothesis that mood state, particularly mania, is associated with chronobiological disturbances. Further research is required in order to define these relationships and to determine the directionality of the associations between mood state and chronobiological characteristics.

  10. Clusters of Tourism Consumers in Romania

    Directory of Open Access Journals (Sweden)

    Pelau Corina

    2018-03-01

    Full Text Available The analysis and determination of typologies of tourism consumers has been a major concern for scientists, specialists and companies as well. Knowing the demographic and motivational factors that determine consumers to buy tourism products can have a major impact on the marketing strategy by a more efficient targeting of customers. This article presents the results of a research that aims to determine the factors which influence the buying decision for tourism products and the clusters of consumers resulted from these factors. 90 persons have been surveyed pursuing the determination of the most important factors for buying a tourism product and the correlation between them. The factor analysis and the cluster analysis have been applied with the help of the SPSS program. The results of the factor analysis group the items into six factors. In a second phase, the consumers have been divided into three categories based on a hierarchical Ward cluster analysis. The three clusters have been defined and analyzed and recommendations for the future research have been given.

  11. Cluster-guided imaging-based CFD analysis of airflow and particle deposition in asthmatic human lungs

    Science.gov (United States)

    Choi, Jiwoong; Leblanc, Lawrence; Choi, Sanghun; Haghighi, Babak; Hoffman, Eric; Lin, Ching-Long

    2017-11-01

    The goal of this study is to assess inter-subject variability in delivery of orally inhaled drug products to small airways in asthmatic lungs. A recent multiscale imaging-based cluster analysis (MICA) of computed tomography (CT) lung images in an asthmatic cohort identified four clusters with statistically distinct structural and functional phenotypes associating with unique clinical biomarkers. Thus, we aimed to address inter-subject variability via inter-cluster variability. We selected a representative subject from each of the 4 asthma clusters as well as 1 male and 1 female healthy controls, and performed computational fluid and particle simulations on CT-based airway models of these subjects. The results from one severe and one non-severe asthmatic cluster subjects characterized by segmental airway constriction had increased particle deposition efficiency, as compared with the other two cluster subjects (one non-severe and one severe asthmatics) without airway constriction. Constriction-induced jets impinging on distal bifurcations led to excessive particle deposition. The results emphasize the impact of airway constriction on regional particle deposition rather than disease severity, demonstrating the potential of using cluster membership to tailor drug delivery. NIH Grants U01HL114494 and S10-RR022421, and FDA Grant U01FD005837. XSEDE.

  12. Cluster as a wave telescope – first results from the fluxgate magnetometer

    Directory of Open Access Journals (Sweden)

    K.-H. Glassmeier

    Full Text Available The four Cluster spacecraft provide an excellent opportunity to study spatial structures in the magnetosphere and adjacent regions. Propagating waves are amongst the interesting structures and for the first time, Cluster will allow one to measure the wave vector of low-frequency fluctuations in a space plasma. Based on a generalized minimum variance analysis wave vector estimates will be determined in the terrestrial magnetosheath and the near-Earth solar wind. The virtue and weakness of the wave telescope technique used is discussed in detail.

    Key words. Electromagnetics (wave propagation – Magnetospheric physics (MHD waves and instabilities; plasma waves and instabilities

  13. On the blind use of statistical tools in the analysis of globular cluster stars

    Science.gov (United States)

    D'Antona, Francesca; Caloi, Vittoria; Tailo, Marco

    2018-04-01

    As with most data analysis methods, the Bayesian method must be handled with care. We show that its application to determine stellar evolution parameters within globular clusters can lead to paradoxical results if used without the necessary precautions. This is a cautionary tale on the use of statistical tools for big data analysis.

  14. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  15. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Halim, Zakiah Abd [Universiti Teknikal Malaysia Melaka (Malaysia); Jamaludin, Nordin; Junaidi, Syarif [Faculty of Engineering and Built, Universiti Kebangsaan Malaysia, Bangi (Malaysia); Yahya, Syed Yusainee Syed [Universiti Teknologi MARA, Shah Alam (Malaysia)

    2015-04-15

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  16. Vibration impact acoustic emission technique for identification and analysis of defects in carbon steel tubes: Part B Cluster analysis

    International Nuclear Information System (INIS)

    Halim, Zakiah Abd; Jamaludin, Nordin; Junaidi, Syarif; Yahya, Syed Yusainee Syed

    2015-01-01

    Current steel tubes inspection techniques are invasive, and the interpretation and evaluation of inspection results are manually done by skilled personnel. Part A of this work details the methodology involved in the newly developed non-invasive, non-destructive tube inspection technique based on the integration of vibration impact (VI) and acoustic emission (AE) systems known as the vibration impact acoustic emission (VIAE) technique. AE signals have been introduced into a series of ASTM A179 seamless steel tubes using the impact hammer. Specifically, a good steel tube as the reference tube and four steel tubes with through-hole artificial defect at different locations were used in this study. The AEs propagation was captured using a high frequency sensor of AE systems. The present study explores the cluster analysis approach based on autoregressive (AR) coefficients to automatically interpret the AE signals. The results from the cluster analysis were graphically illustrated using a dendrogram that demonstrated the arrangement of the natural clusters of AE signals. The AR algorithm appears to be the more effective method in classifying the AE signals into natural groups. This approach has successfully classified AE signals for quick and confident interpretation of defects in carbon steel tubes.

  17. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  18. Cluster Physics with Merging Galaxy Clusters

    Directory of Open Access Journals (Sweden)

    Sandor M. Molnar

    2016-02-01

    Full Text Available Collisions between galaxy clusters provide a unique opportunity to study matter in a parameter space which cannot be explored in our laboratories on Earth. In the standard LCDM model, where the total density is dominated by the cosmological constant ($Lambda$ and the matter density by cold dark matter (CDM, structure formation is hierarchical, and clusters grow mostly by merging.Mergers of two massive clusters are the most energetic events in the universe after the Big Bang,hence they provide a unique laboratory to study cluster physics.The two main mass components in clusters behave differently during collisions:the dark matter is nearly collisionless, responding only to gravity, while the gas is subject to pressure forces and dissipation, and shocks and turbulenceare developed during collisions. In the present contribution we review the different methods used to derive the physical properties of merging clusters. Different physical processes leave their signatures on different wavelengths, thusour review is based on a multifrequency analysis. In principle, the best way to analyze multifrequency observations of merging clustersis to model them using N-body/HYDRO numerical simulations. We discuss the results of such detailed analyses.New high spatial and spectral resolution ground and space based telescopeswill come online in the near future. Motivated by these new opportunities,we briefly discuss methods which will be feasible in the near future in studying merging clusters.

  19. Computational cluster validation for microarray data analysis: experimental assessment of Clest, Consensus Clustering, Figure of Merit, Gap Statistics and Model Explorer

    Directory of Open Access Journals (Sweden)

    Utro Filippo

    2008-10-01

    Full Text Available Abstract Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering, FOM (Figure of Merit, Gap (Gap Statistics and ME (Model Explorer, in addition to the classic WCSS (Within Cluster Sum-of-Squares and KL (Krzanowski and Lai index. We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature. Conclusion Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC. FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time

  20. An alternative methodological approach to value analysis of regions, municipal corporations and clusters

    Directory of Open Access Journals (Sweden)

    Mojmír Sabolovič

    2011-01-01

    Full Text Available The paper deals with theoretical conception of value analysis of regions, municipal corporations and clusters. The subject of this paper is heterodox approach to sensitivity analysis of finite set of variables based on non-additive measure. For dynamic analysis of trajectory of general value are sufficient robust models based on maximum entropy principle. Findings concern explanation of proper fuzzy integral – Choquet integral. The fuzzy measure is represented by theory of capacities (Choquet, 1953 on powerset. In fine, the conception of the New integral for capacities (Lehler, 2005 is discussed. Value analysis and transmission constitutes remarkable aspect of performance evaluation of regions, municipal corporations and clusters. In the light of high ratio of soft variables, social behavior, intangible assets and human capital within those types of subjects the fuzzy integral introduce useful tool for modeling. The New integral afterwards concerns considerable characteristic of people behavior – risk averse articulated concave function and non-additive operator. Results comprehended tools enabling observation of synergy, redundancy and inhibition of value variables as consequence of non-additive measure. In fine, results induced issues for future research.

  1. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    Science.gov (United States)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  2. The dynamics of cyclone clustering in re-analysis and a high-resolution climate model

    Science.gov (United States)

    Priestley, Matthew; Pinto, Joaquim; Dacre, Helen; Shaffrey, Len

    2017-04-01

    Extratropical cyclones have a tendency to occur in groups (clusters) in the exit of the North Atlantic storm track during wintertime, potentially leading to widespread socioeconomic impacts. The Winter of 2013/14 was the stormiest on record for the UK and was characterised by the recurrent clustering of intense extratropical cyclones. This clustering was associated with a strong, straight and persistent North Atlantic 250 hPa jet with Rossby wave-breaking (RWB) on both flanks, pinning the jet in place. Here, we provide for the first time an analysis of all clustered events in 36 years of the ERA-Interim Re-analysis at three latitudes (45˚ N, 55˚ N, 65˚ N) encompassing various regions of Western Europe. The relationship between the occurrence of RWB and cyclone clustering is studied in detail. Clustering at 55˚ N is associated with an extended and anomalously strong jet flanked on both sides by RWB. However, clustering at 65(45)˚ N is associated with RWB to the south (north) of the jet, deflecting the jet northwards (southwards). A positive correlation was found between the intensity of the clustering and RWB occurrence to the north and south of the jet. However, there is considerable spread in these relationships. Finally, analysis has shown that the relationships identified in the re-analysis are also present in a high-resolution coupled global climate model (HiGEM). In particular, clustering is associated with the same dynamical conditions at each of our three latitudes in spite of the identified biases in frequency and intensity of RWB.

  3. Somatosensory nociceptive characteristics differentiate subgroups in people with chronic low back pain: a cluster analysis.

    Science.gov (United States)

    Rabey, Martin; Slater, Helen; OʼSullivan, Peter; Beales, Darren; Smith, Anne

    2015-10-01

    The objectives of this study were to explore the existence of subgroups in a cohort with chronic low back pain (n = 294) based on the results of multimodal sensory testing and profile subgroups on demographic, psychological, lifestyle, and general health factors. Bedside (2-point discrimination, brush, vibration and pinprick perception, temporal summation on repeated monofilament stimulation) and laboratory (mechanical detection threshold, pressure, heat and cold pain thresholds, conditioned pain modulation) sensory testing were examined at wrist and lumbar sites. Data were entered into principal component analysis, and 5 component scores were entered into latent class analysis. Three clusters, with different sensory characteristics, were derived. Cluster 1 (31.9%) was characterised by average to high temperature and pressure pain sensitivity. Cluster 2 (52.0%) was characterised by average to high pressure pain sensitivity. Cluster 3 (16.0%) was characterised by low temperature and pressure pain sensitivity. Temporal summation occurred significantly more frequently in cluster 1. Subgroups were profiled on pain intensity, disability, depression, anxiety, stress, life events, fear avoidance, catastrophizing, perception of the low back region, comorbidities, body mass index, multiple pain sites, sleep, and activity levels. Clusters 1 and 2 had a significantly greater proportion of female participants and higher depression and sleep disturbance scores than cluster 3. The proportion of participants undertaking Low back pain, therefore, does not appear to be homogeneous. Pain mechanisms relating to presentations of each subgroup were postulated. Future research may investigate prognoses and interventions tailored towards these subgroups.

  4. Cluster analysis of HZE particle tracks as applied to space radiobiology problems

    International Nuclear Information System (INIS)

    Batmunkh, M.; Bayarchimeg, L.; Lkhagva, O.; Belov, O.

    2013-01-01

    A cluster analysis is performed of ionizations in tracks produced by the most abundant nuclei in the charge and energy spectra of the galactic cosmic rays. The frequency distribution of clusters is estimated for cluster sizes comparable to the DNA molecule at different packaging levels. For this purpose, an improved K-mean-based algorithm is suggested. This technique allows processing particle tracks containing a large number of ionization events without setting the number of clusters as an input parameter. Using this method, the ionization distribution pattern is analyzed depending on the cluster size and particle's linear energy transfer

  5. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  6. Accommodating error analysis in comparison and clustering of molecular fingerprints.

    Science.gov (United States)

    Salamon, H; Segal, M R; Ponce de Leon, A; Small, P M

    1998-01-01

    Molecular epidemiologic studies of infectious diseases rely on pathogen genotype comparisons, which usually yield patterns comprising sets of DNA fragments (DNA fingerprints). We use a highly developed genotyping system, IS6110-based restriction fragment length polymorphism analysis of Mycobacterium tuberculosis, to develop a computational method that automates comparison of large numbers of fingerprints. Because error in fragment length measurements is proportional to fragment length and is positively correlated for fragments within a lane, an align-and-count method that compensates for relative scaling of lanes reliably counts matching fragments between lanes. Results of a two-step method we developed to cluster identical fingerprints agree closely with 5 years of computer-assisted visual matching among 1,335 M. tuberculosis fingerprints. Fully documented and validated methods of automated comparison and clustering will greatly expand the scope of molecular epidemiology.

  7. Cluster analysis of novel isometric strength measures produces a valid and evidence-based classification structure for wheelchair track racing.

    Science.gov (United States)

    Connick, Mark J; Beckman, Emma; Vanlandewijck, Yves; Malone, Laurie A; Blomqvist, Sven; Tweedy, Sean M

    2017-11-25

    The Para athletics wheelchair-racing classification system employs best practice to ensure that classes comprise athletes whose impairments cause a comparable degree of activity limitation. However, decision-making is largely subjective and scientific evidence which reduces this subjectivity is required. To evaluate whether isometric strength tests were valid for the purposes of classifying wheelchair racers and whether cluster analysis of the strength measures produced a valid classification structure. Thirty-two international level, male wheelchair racers from classes T51-54 completed six isometric strength tests evaluating elbow extensors, shoulder flexors, trunk flexors and forearm pronators and two wheelchair performance tests-Top-Speed (0-15 m) and Top-Speed (absolute). Strength tests significantly correlated with wheelchair performance were included in a cluster analysis and the validity of the resulting clusters was assessed. All six strength tests correlated with performance (r=0.54-0.88). Cluster analysis yielded four clusters with reasonable overall structure (mean silhouette coefficient=0.58) and large intercluster strength differences. Six athletes (19%) were allocated to clusters that did not align with their current class. While the mean wheelchair racing performance of the resulting clusters was unequivocally hierarchical, the mean performance of current classes was not, with no difference between current classes T53 and T54. Cluster analysis of isometric strength tests produced classes comprising athletes who experienced a similar degree of activity limitation. The strength tests reported can provide the basis for a new, more transparent, less subjective wheelchair racing classification system, pending replication of these findings in a larger, representative sample. This paper also provides guidance for development of evidence-based systems in other Para sports. © Article author(s) (or their employer(s) unless otherwise stated in the text of

  8. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    International Nuclear Information System (INIS)

    Harmon, S; Wendelberger, B; Jeraj, R

    2014-01-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [ 18 F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI mean = 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI range : 0.2301–1). Conclusion: Using commonly-used clustering algorithms, we found

  9. SU-E-J-98: Radiogenomics: Correspondence Between Imaging and Genetic Features Based On Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harmon, S; Wendelberger, B [University of Wisconsin-Madison, Madison, WI (United States); Jeraj, R [University of Wisconsin-Madison, Madison, WI (United States); University of Ljubljana (Slovenia)

    2014-06-01

    Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms

  10. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    Science.gov (United States)

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  11. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  12. Using Cluster Analysis to Group Countries for Cost-effectiveness Analysis: An Application to Sub-Saharan Africa.

    Science.gov (United States)

    Russell, Louise B; Bhanot, Gyan; Kim, Sun-Young; Sinha, Anushua

    2018-02-01

    To explore the use of cluster analysis to define groups of similar countries for the purpose of evaluating the cost-effectiveness of a public health intervention-maternal immunization-within the constraints of a project budget originally meant for an overall regional analysis. We used the most common cluster analysis algorithm, K-means, and the most common measure of distance, Euclidean distance, to group 37 low-income, sub-Saharan African countries on the basis of 24 measures of economic development, general health resources, and past success in public health programs. The groups were tested for robustness and reviewed by regional disease experts. We explored 2-, 3- and 4-group clustering. Public health performance was consistently important in determining the groups. For the 2-group clustering, for example, infant mortality in Group 1 was 81 per 1,000 live births compared with 51 per 1,000 in Group 2, and 67% of children in Group 1 received DPT immunization compared with 87% in Group 2. The experts preferred four groups to fewer, on the ground that national decision makers would more readily recognize their country among four groups. Clusters defined by K-means clustering made sense to subject experts and allowed a more detailed evaluation of the cost-effectiveness of maternal immunization within the constraint of the project budget. The method may be useful for other evaluations that, without having the resources to conduct separate analyses for each unit, seek to inform decision makers in numerous countries or subdivisions within countries, such as states or counties.

  13. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Walid El Ansari

    2018-05-01

    Full Text Available BackgroundLimited research has explored clustering of lifestyle behavioral risk factors (BRFs among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance.MethodWe assessed (BRFs, namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities.ResultsA two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious with very high health awareness/consciousness, good nutrition, and physical activity (PA, and relatively low alcohol, tobacco, and other drug (ATOD use. Cluster 2 (the abstinent had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1, students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4 reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students.ConclusionOur results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  14. Emergy-based comparative analysis on industrial clusters: economic and technological development zone of Shenyang area, China.

    Science.gov (United States)

    Liu, Zhe; Geng, Yong; Zhang, Pan; Dong, Huijuan; Liu, Zuoxi

    2014-09-01

    In China, local governments of many areas prefer to give priority to the development of heavy industrial clusters in pursuit of high value of gross domestic production (GDP) growth to get political achievements, which usually results in higher costs from ecological degradation and environmental pollution. Therefore, effective methods and reasonable evaluation system are urgently needed to evaluate the overall efficiency of industrial clusters. Emergy methods links economic and ecological systems together, which can evaluate the contribution of ecological products and services as well as the load placed on environmental systems. This method has been successfully applied in many case studies of ecosystem but seldom in industrial clusters. This study applied the methodology of emergy analysis to perform the efficiency of industrial clusters through a series of emergy-based indices as well as the proposed indicators. A case study of Shenyang Economic Technological Development Area (SETDA) was investigated to show the emergy method's practical potential to evaluate industrial clusters to inform environmental policy making. The results of our study showed that the industrial cluster of electric equipment and electronic manufacturing produced the most economic value and had the highest efficiency of energy utilization among the four industrial clusters. However, the sustainability index of the industrial cluster of food and beverage processing was better than the other industrial clusters.

  15. Cluster Statistics of BTW Automata

    International Nuclear Information System (INIS)

    Ajanta Bhowal Acharyya

    2011-01-01

    The cluster statistics of BTW automata in the SOC states are obtained by extensive computer simulation. Various moments of the clusters are calculated and few results are compared with earlier available numerical estimates and exact results. Reasonably good agreement is observed. An extended statistical analysis has been made. (author)

  16. A cluster analysis investigation of workaholism as a syndrome.

    Science.gov (United States)

    Aziz, Shahnaz; Zickar, Michael J

    2006-01-01

    Workaholism has been conceptualized as a syndrome although there have been few tests that explicitly consider its syndrome status. The authors analyzed a three-dimensional scale of workaholism developed by Spence and Robbins (1992) using cluster analysis. The authors identified three clusters of individuals, one of which corresponded to Spence and Robbins's profile of the workaholic (high work involvement, high drive to work, low work enjoyment). Consistent with previously conjectured relations with workaholism, individuals in the workaholic cluster were more likely to label themselves as workaholics, more likely to have acquaintances label them as workaholics, and more likely to have lower life satisfaction and higher work-life imbalance. The importance of considering workaholism as a syndrome and the implications for effective interventions are discussed. Copyright 2006 APA.

  17. Sejong Open Cluster Survey (SOS). 0. Target Selection and Data Analysis

    Science.gov (United States)

    Sung, Hwankyung; Lim, Beomdu; Bessell, Michael S.; Kim, Jinyoung S.; Hur, Hyeonoh; Chun, Moo-Young; Park, Byeong-Gon

    2013-06-01

    Star clusters are superb astrophysical laboratories containing cospatial and coeval samples of stars with similar chemical composition. We initiate the Sejong Open cluster Survey (SOS) - a project dedicated to providing homogeneous photometry of a large number of open clusters in the SAAO Johnson-Cousins' UBVI system. To achieve our main goal, we pay much attention to the observation of standard stars in order to reproduce the SAAO standard system. Many of our targets are relatively small sparse clusters that escaped previous observations. As clusters are considered building blocks of the Galactic disk, their physical properties such as the initial mass function, the pattern of mass segregation, etc. give valuable information on the formation and evolution of the Galactic disk. The spatial distribution of young open clusters will be used to revise the local spiral arm structure of the Galaxy. In addition, the homogeneous data can also be used to test stellar evolutionary theory, especially concerning rare massive stars. In this paper we present the target selection criteria, the observational strategy for accurate photometry, and the adopted calibrations for data analysis such as color-color relations, zero-age main sequence relations, Sp - M_V relations, Sp - T_{eff} relations, Sp - color relations, and T_{eff} - BC relations. Finally we provide some data analysis such as the determination of the reddening law, the membership selection criteria, and distance determination.

  18. Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

    Science.gov (United States)

    Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

    2015-05-01

    To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.

  19. Developing a Clustering-Based Empirical Bayes Analysis Method for Hotspot Identification

    Directory of Open Access Journals (Sweden)

    Yajie Zou

    2017-01-01

    Full Text Available Hotspot identification (HSID is a critical part of network-wide safety evaluations. Typical methods for ranking sites are often rooted in using the Empirical Bayes (EB method to estimate safety from both observed crash records and predicted crash frequency based on similar sites. The performance of the EB method is highly related to the selection of a reference group of sites (i.e., roadway segments or intersections similar to the target site from which safety performance functions (SPF used to predict crash frequency will be developed. As crash data often contain underlying heterogeneity that, in essence, can make them appear to be generated from distinct subpopulations, methods are needed to select similar sites in a principled manner. To overcome this possible heterogeneity problem, EB-based HSID methods that use common clustering methodologies (e.g., mixture models, K-means, and hierarchical clustering to select “similar” sites for building SPFs are developed. Performance of the clustering-based EB methods is then compared using real crash data. Here, HSID results, when computed on Texas undivided rural highway cash data, suggest that all three clustering-based EB analysis methods are preferred over the conventional statistical methods. Thus, properly classifying the road segments for heterogeneous crash data can further improve HSID accuracy.

  20. Analysis of the project synthesis goal cluster orientation and inquiry emphasis of elementary science textbooks

    Science.gov (United States)

    Staver, John R.; Bay, Mary

    The purpose of this descriptive study was to examine selected units of commonly used elementary science texts, using the Project Synthesis goal clusters as a framework for part of the examination. An inquiry classification scheme was used for the remaining segment. Four questions were answered: (1) To what extent do elementary science textbooks focus on each Project Synthesis goal cluster? (2) In which part of the text is such information found? (3) To what extent are the activities and experiments merely verifications of information already introduced in the text? (4) If inquiry is present in an activity, then what is the level of such inquiry?Eleven science textbook series, which comprise approximately 90 percent of the national market, were selected for analysis. Two units, one primary (K-3) and one intermediate (4-6), were selected for analysis by first identifying units common to most series, then randomly selecting one primary and one intermediate unit for analysis.Each randomly selected unit was carefully read, using the sentence as the unit of analysis. Each declarative and interrogative sentence in the body of the text was classified as: (1) academic; (2) personal; (3) career; or (4) societal in its focus. Each illustration, except those used in evaluation items, was similarly classified. Each activity/experiment and each miscellaneous sentence in end-of-chapter segments labelled review, summary, evaluation, etc., were similarly classified. Finally, each activity/experiment, as a whole, was categorized according to a four-category inquiry scheme (confirmation, structured inquiry, guided inquiry, open inquiry).In general, results of the analysis are: (1) most text prose focuses on academic science; (2) most remaining text prose focuses on the personal goal cluster; (3) the career and societal goal clusters receive only minor attention; (4) text illustrations exhibit a pattern similar to text prose; (5) text activities/experiments are academic in orientation

  1. Application of Cluster Analysis in Assessment of Dietary Habits of Secondary School Students

    Directory of Open Access Journals (Sweden)

    Zalewska Magdalena

    2014-12-01

    Full Text Available Maintenance of proper health and prevention of diseases of civilization are now significant public health problems. Nutrition is an important factor in the development of youth, as well as the current and future state of health. The aim of the study was to show the benefits of the application of cluster analysis to assess the dietary habits of high school students. The survey was carried out on 1,631 eighteen-year-old students in seven randomly selected secondary schools in Bialystok using a self-prepared anonymous questionnaire. An evaluation of the time of day meals were eaten and the number of meals consumed was made for the surveyed students. The cluster analysis allowed distinguishing characteristic structures of dietary habits in the observed population. Four clusters were identified, which were characterized by relative internal homogeneity and substantial variation in terms of the number of meals during the day and the time of their consumption. The most important characteristics of cluster 1 were cumulated food ration in 2 or 3 meals and long intervals between meals. Cluster 2 was characterized by eating the recommended number of 4 or 5 meals a day. In the 3rd cluster, students ate 3 meals a day with large intervals between them, and in the 4th they had four meals a day while maintaining proper intervals between them. In all clusters dietary mistakes occurred, but most of them were related to clusters 1 and 3. Cluster analysis allowed for the identification of major flaws in nutrition, which may include irregular eating and skipping meals, and indicated possible connections between eating patterns and disturbances of body weight in the examined population.

  2. Deconstructing Bipolar Disorder and Schizophrenia: A cross-diagnostic cluster analysis of cognitive phenotypes.

    Science.gov (United States)

    Lee, Junghee; Rizzo, Shemra; Altshuler, Lori; Glahn, David C; Miklowitz, David J; Sugar, Catherine A; Wynn, Jonathan K; Green, Michael F

    2017-02-01

    Bipolar disorder (BD) and schizophrenia (SZ) show substantial overlap. It has been suggested that a subgroup of patients might contribute to these overlapping features. This study employed a cross-diagnostic cluster analysis to identify subgroups of individuals with shared cognitive phenotypes. 143 participants (68 BD patients, 39 SZ patients and 36 healthy controls) completed a battery of EEG and performance assessments on perception, nonsocial cognition and social cognition. A K-means cluster analysis was conducted with all participants across diagnostic groups. Clinical symptoms, functional capacity, and functional outcome were assessed in patients. A two-cluster solution across 3 groups was the most stable. One cluster including 44 BD patients, 31 controls and 5 SZ patients showed better cognition (High cluster) than the other cluster with 24 BD patients, 35 SZ patients and 5 controls (Low cluster). BD patients in the High cluster performed better than BD patients in the Low cluster across cognitive domains. Within each cluster, participants with different clinical diagnoses showed different profiles across cognitive domains. All patients are in the chronic phase and out of mood episode at the time of assessment and most of the assessment were behavioral measures. This study identified two clusters with shared cognitive phenotype profiles that were not proxies for clinical diagnoses. The finding of better social cognitive performance of BD patients than SZ patients in the Lowe cluster suggest that relatively preserved social cognition may be important to identify disease process distinct to each disorder. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    We present an approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways. We have also proposed a buffer size and worst case queuing delay analysis for the gateways......, responsible for routing inter-cluster traffic. Optimization heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of our approaches....

  4. Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Paul; Eles, Petru; Peng, Zebo

    2003-01-01

    An approach to schedulability analysis for the synthesis of multi-cluster distributed embedded systems consisting of time-triggered and event-triggered clusters, interconnected via gateways, is presented. A buffer size and worst case queuing delay analysis for the gateways, responsible for routing...... inter-cluster traffic, is also proposed. Optimisation heuristics for the priority assignment and synthesis of bus access parameters aimed at producing a schedulable system with minimal buffer needs have been proposed. Extensive experiments and a real-life example show the efficiency of the approaches....

  5. A Comparison of AOP Classification Based on Difficulty, Importance, and Frequency by Cluster Analysis and Standardized Mean

    International Nuclear Information System (INIS)

    Choi, Sun Yeong; Jung, Wondea

    2014-01-01

    In Korea, there are plants that have more than one-hundred kinds of abnormal operation procedures (AOPs). Therefore, operators have started to recognize the importance of classifying the AOPs. They should pay attention to those AOPs required to take emergency measures against an abnormal status that has a more serious effect on plant safety and/or often occurs. We suggested a measure of prioritizing AOPs for a training purpose based on difficulty, importance, and frequency. A DIF analysis based on how difficult the task is, how important it is, and how frequently they occur is a well-known method of assessing the performance, prioritizing training needs and planning. We used an SDIF-mean (Standardized DIF-mean) to prioritize AOPs in the previous paper. For the SDIF-mean, we standardized the three kinds of data respectively. The results of this research will be utilized not only to understand the AOP characteristics at a job analysis level but also to develop an effective AOP training program. The purpose of this paper is to perform a cluster analysis for an AOP classification and compare the results through a cluster analysis with that by a standardized mean based on difficulty, importance, and frequency. In this paper, we categorized AOPs into three groups by a cluster analysis based on D, I, and F. Clustering is the classification of similar objects into groups so that each group shares some common characteristics. In addition, we compared the result by the cluster analysis in this paper with the classification result by the SDIF-mean in the previous paper. From the comparison, we found that a reevaluation can be required to assign a training interval for the AOPs of group C' in the previous paper those have lower SDIF-mean. The reason for this is that some of the AOPs of group C' have quite high D and I values while they have the lowest frequencies. From an educational point of view, AOPs in group which have the highest difficulty and importance, but

  6. A Comparison of AOP Classification Based on Difficulty, Importance, and Frequency by Cluster Analysis and Standardized Mean

    Energy Technology Data Exchange (ETDEWEB)

    Choi, Sun Yeong; Jung, Wondea [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2014-05-15

    In Korea, there are plants that have more than one-hundred kinds of abnormal operation procedures (AOPs). Therefore, operators have started to recognize the importance of classifying the AOPs. They should pay attention to those AOPs required to take emergency measures against an abnormal status that has a more serious effect on plant safety and/or often occurs. We suggested a measure of prioritizing AOPs for a training purpose based on difficulty, importance, and frequency. A DIF analysis based on how difficult the task is, how important it is, and how frequently they occur is a well-known method of assessing the performance, prioritizing training needs and planning. We used an SDIF-mean (Standardized DIF-mean) to prioritize AOPs in the previous paper. For the SDIF-mean, we standardized the three kinds of data respectively. The results of this research will be utilized not only to understand the AOP characteristics at a job analysis level but also to develop an effective AOP training program. The purpose of this paper is to perform a cluster analysis for an AOP classification and compare the results through a cluster analysis with that by a standardized mean based on difficulty, importance, and frequency. In this paper, we categorized AOPs into three groups by a cluster analysis based on D, I, and F. Clustering is the classification of similar objects into groups so that each group shares some common characteristics. In addition, we compared the result by the cluster analysis in this paper with the classification result by the SDIF-mean in the previous paper. From the comparison, we found that a reevaluation can be required to assign a training interval for the AOPs of group C' in the previous paper those have lower SDIF-mean. The reason for this is that some of the AOPs of group C' have quite high D and I values while they have the lowest frequencies. From an educational point of view, AOPs in group which have the highest difficulty and importance, but

  7. Genetic Diversity and Relationships of Neolamarckia cadamba (Roxb. Bosser progenies through cluster analysis

    Directory of Open Access Journals (Sweden)

    M. Preethi Shree

    2018-04-01

    Full Text Available Genetic diversity analysis was conducted for biometric attributes in 20 progenies of Neolamarckia cadamba. The application of D2 clustering technique in Neolamarckia cadamba genetic resources resolved the 20 progenies into five clusters. The maximum intra cluster distance was shown by the cluster II. The maximum inter cluster distance was recorded between cluster III and V which indicated the presence of wider genetic distance between Neolamarckia cadamba progenies. Among the growth attributes, volume (36.84 % contributed maximum towards genetic divergence followed by bole height, basal diameter, tree height, number of branches in Neolamarckia cadamba progenies.

  8. Predicting healthcare outcomes in prematurely born infants using cluster analysis.

    Science.gov (United States)

    MacBean, Victoria; Lunt, Alan; Drysdale, Simon B; Yarzi, Muska N; Rafferty, Gerrard F; Greenough, Anne

    2018-05-23

    Prematurely born infants are at high risk of respiratory morbidity following neonatal unit discharge, though prediction of outcomes is challenging. We have tested the hypothesis that cluster analysis would identify discrete groups of prematurely born infants with differing respiratory outcomes during infancy. A total of 168 infants (median (IQR) gestational age 33 (31-34) weeks) were recruited in the neonatal period from consecutive births in a tertiary neonatal unit. The baseline characteristics of the infants were used to classify them into hierarchical agglomerative clusters. Rates of viral lower respiratory tract infections (LRTIs) were recorded for 151 infants in the first year after birth. Infants could be classified according to birth weight and duration of neonatal invasive mechanical ventilation (MV) into three clusters. Cluster one (MV ≤5 days) had few LRTIs. Clusters two and three (both MV ≥6 days, but BW ≥or <882 g respectively), had significantly higher LRTI rates. Cluster two had a higher proportion of infants experiencing respiratory syncytial virus LRTIs (P = 0.01) and cluster three a higher proportion of rhinovirus LRTIs (P < 0.001) CONCLUSIONS: Readily available clinical data allowed classification of prematurely born infants into one of three distinct groups with differing subsequent respiratory morbidity in infancy. © 2018 Wiley Periodicals, Inc.

  9. X-Ray Morphological Analysis of the Planck ESZ Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Andrade-Santos, Felipe; Randall, Scott; Kraft, Ralph [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Ettori, Stefano [INAF, Osservatorio Astronomico di Bologna, via Ranzani 1, I-40127 Bologna (Italy); Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W. [Laboratoire AIM, IRFU/Service d’Astrophysique—CEA/DRF—CNRS—Université Paris Diderot, Bât. 709, CEA-Saclay, F-91191 Gif-sur-Yvette Cedex (France)

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  10. Multiscale deep drawing analysis of dual-phase steels using grain cluster-based RGC scheme

    International Nuclear Information System (INIS)

    Tjahjanto, D D; Eisenlohr, P; Roters, F

    2015-01-01

    Multiscale modelling and simulation play an important role in sheet metal forming analysis, since the overall material responses at macroscopic engineering scales, e.g. formability and anisotropy, are strongly influenced by microstructural properties, such as grain size and crystal orientations (texture). In the present report, multiscale analysis on deep drawing of dual-phase steels is performed using an efficient grain cluster-based homogenization scheme.The homogenization scheme, called relaxed grain cluster (RGC), is based on a generalization of the grain cluster concept, where a (representative) volume element consists of p  ×  q  ×  r (hexahedral) grains. In this scheme, variation of the strain or deformation of individual grains is taken into account through the, so-called, interface relaxation, which is formulated within an energy minimization framework. An interfacial penalty term is introduced into the energy minimization framework in order to account for the effects of grain boundaries.The grain cluster-based homogenization scheme has been implemented and incorporated into the advanced material simulation platform DAMASK, which purposes to bridge the macroscale boundary value problems associated with deep drawing analysis to the micromechanical constitutive law, e.g. crystal plasticity model. Standard Lankford anisotropy tests are performed to validate the model parameters prior to the deep drawing analysis. Model predictions for the deep drawing simulations are analyzed and compared to the corresponding experimental data. The result shows that the predictions of the model are in a very good agreement with the experimental measurement. (paper)

  11. Experimental analysis of clustering structures in magnetic and MR fluids using ultrasound

    International Nuclear Information System (INIS)

    Bramantya, M A; Takuma, H; Faiz, M; Sawada, T; Motozawa, M

    2009-01-01

    The formation of clustering structures in magnetic and MR fluids has an influence on ultrasonic propagation. We propose a qualitative analysis of these structures by measuring properties of ultrasonic propagation. Since magnetic and MR fluids are opaque, the non-contact inspection using this ultrasonic technique can be very useful for analyzing the inner structures of magnetic and MR fluids. We measured ultrasonic propagation velocity in a hydrocarbon-based magnetic fluid and MR fluid precisely. Based on these results, the clustering structures of these fluids were analyzed experimentally in terms of elapsed time dependence, effect of external magnetic field strength and angle, and hysteresis phenomena. A comparison of ultrasonic velocity propagation between magnetic and MR fluid was discussed.

  12. Identifying influential individuals on intensive care units: using cluster analysis to explore culture.

    Science.gov (United States)

    Fong, Allan; Clark, Lindsey; Cheng, Tianyi; Franklin, Ella; Fernandez, Nicole; Ratwani, Raj; Parker, Sarah Henrickson

    2017-07-01

    The objective of this paper is to identify attribute patterns of influential individuals in intensive care units using unsupervised cluster analysis. Despite the acknowledgement that culture of an organisation is critical to improving patient safety, specific methods to shift culture have not been explicitly identified. A social network analysis survey was conducted and an unsupervised cluster analysis was used. A total of 100 surveys were gathered. Unsupervised cluster analysis was used to group individuals with similar dimensions highlighting three general genres of influencers: well-rounded, knowledge and relational. Culture is created locally by individual influencers. Cluster analysis is an effective way to identify common characteristics among members of an intensive care unit team that are noted as highly influential by their peers. To change culture, identifying and then integrating the influencers in intervention development and dissemination may create more sustainable and effective culture change. Additional studies are ongoing to test the effectiveness of utilising these influencers to disseminate patient safety interventions. This study offers an approach that can be helpful in both identifying and understanding influential team members and may be an important aspect of developing methods to change organisational culture. © 2017 John Wiley & Sons Ltd.

  13. CLUSTER ANALYSIS OF TOTAL ASSETS PROVIDED BY BANKS FROM FOUR CONTINENTS

    Directory of Open Access Journals (Sweden)

    MIRELA CĂTĂLINA TÜRKEȘ

    2017-08-01

    Full Text Available The paper analysed the total assets in 2016 achieved by the strongest 96 banks from 4 continents: Europe, America, Asia and Africa. It aims to evaluate the level of total assets provided by banks in 2016 and continental banking markets degree of differentiation to determine the overall conditions of the banks. Methodologies used in this study are based on cluster and descriptives analysis. Data set was built based on informations reported by banks on total assets. The results indicate that most of total banking assets are found in Asia and the fewest in Africa. At the end of 2016, the top 16 global banks owned total assets of $ 30.19 trillion according to the data set contains cluster 1 and the centroid was (2.25, 2.11, 3.06, 0.01.

  14. Scientific Cluster Deployment and Recovery - Using puppet to simplify cluster management

    Science.gov (United States)

    Hendrix, Val; Benjamin, Doug; Yao, Yushu

    2012-12-01

    Deployment, maintenance and recovery of a scientific cluster, which has complex, specialized services, can be a time consuming task requiring the assistance of Linux system administrators, network engineers as well as domain experts. Universities and small institutions that have a part-time FTE with limited time for and knowledge of the administration of such clusters can be strained by such maintenance tasks. This current work is the result of an effort to maintain a data analysis cluster (DAC) with minimal effort by a local system administrator. The realized benefit is the scientist, who is the local system administrator, is able to focus on the data analysis instead of the intricacies of managing a cluster. Our work provides a cluster deployment and recovery process (CDRP) based on the puppet configuration engine allowing a part-time FTE to easily deploy and recover entire clusters with minimal effort. Puppet is a configuration management system (CMS) used widely in computing centers for the automatic management of resources. Domain experts use Puppet's declarative language to define reusable modules for service configuration and deployment. Our CDRP has three actors: domain experts, a cluster designer and a cluster manager. The domain experts first write the puppet modules for the cluster services. A cluster designer would then define a cluster. This includes the creation of cluster roles, mapping the services to those roles and determining the relationships between the services. Finally, a cluster manager would acquire the resources (machines, networking), enter the cluster input parameters (hostnames, IP addresses) and automatically generate deployment scripts used by puppet to configure it to act as a designated role. In the event of a machine failure, the originally generated deployment scripts along with puppet can be used to easily reconfigure a new machine. The cluster definition produced in our CDRP is an integral part of automating cluster deployment

  15. AN UPDATED CATALOG OF M33 CLUSTERS AND CANDIDATES: UBVRI PHOTOMETRY AND SOME STATISTICAL RESULTS

    International Nuclear Information System (INIS)

    Ma Jun

    2012-01-01

    We present UBVRI photometry for 392 star clusters and candidates in the field of M33, which are selected from the most recent star cluster catalog. In this catalog, the authors listed star clusters' parameters such as cluster positions, magnitudes, colors in the UBVRIJHK s filters, and so on. However, a large fraction of objects in this catalog do not have previously published photometry. Photometry is performed using archival images from the Local Group Galaxies Survey, which covers 0.8 deg 2 along the major axis of M33. Detailed comparisons show that, in general, our photometry is consistent with previous measurements. Positions (right ascension and declination) for some clusters are corrected here. Combined with previous literature, ours constitute a large sample of M33 star clusters. Based on this cluster sample, we present some statistical results: none of the youngest M33 clusters (∼10 7 yr) have masses approaching 10 5 M ☉ ; roughly half the star clusters are consistent with the 10 4 -10 5 M ☉ mass models; the continuous distribution of star clusters along the model line indicates that M33 star clusters have been formed continuously from the epoch of the first star cluster formation until recent times; and there are ∼50 star clusters which are overlapped with the Galactic globular clusters on the color-color diagram, and these clusters are old globular cluster candidates in M33.

  16. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  17. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio.

    Science.gov (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh

    2012-01-01

    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  18. Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range

    Science.gov (United States)

    Guennou, L.; Adami, C.; Durret, F.; Lima Neto, G. B.; Ulmer, M. P.; Clowe, D.; LeBrun, V.; Martinet, N.; Allam, S.; Annis, J.; Basa, S.; Benoist, C.; Biviano, A.; Cappi, A.; Cypriano, E. S.; Gavazzi, R.; Halliday, C.; Ilbert, O.; Jullo, E.; Just, D.; Limousin, M.; Márquez, I.; Mazure, A.; Murphy, K. J.; Plana, H.; Rostagni, F.; Russeil, D.; Schirmer, M.; Slezak, E.; Tucker, D.; Zaritsky, D.; Ziegler, B.

    2014-01-01

    Context. The DAFT/FADA survey is based on the study of ~90 rich (masses found in the literature >2 × 1014 M⊙) and moderately distant clusters (redshifts 0.4 DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range are available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter (ΛCDM) cosmology. Methods: In X-rays, we analysed the XMM-Newton data available, fit a β-model, and subtracted it to identify residuals. We used Chandra data, when available, to identify point sources. In the optical, we applied a Serna & Gerbal (SG) analysis to clusters with at least 15 spectroscopic galaxy redshifts available in the cluster range. We discuss the substructure detection efficiencies of both methods. Results: XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a β-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. The choice of a minimum number of 15 redshifts implies that only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster pericentre approach and are relatively recent infalls. We also find hints of a decreasing X-ray gas density profile core radius with redshift. Conclusions: The percentage of mass included in substructures was found to be roughly constant with redshift values of 5-15%, in agreement both with the general CDM framework and with the results of numerical simulations. Galaxies in substructures

  19. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster

    Science.gov (United States)

    Syakur, M. A.; Khotimah, B. K.; Rochman, E. M. S.; Satoto, B. D.

    2018-04-01

    Clustering is a data mining technique used to analyse data that has variations and the number of lots. Clustering was process of grouping data into a cluster, so they contained data that is as similar as possible and different from other cluster objects. SMEs Indonesia has a variety of customers, but SMEs do not have the mapping of these customers so they did not know which customers are loyal or otherwise. Customer mapping is a grouping of customer profiling to facilitate analysis and policy of SMEs in the production of goods, especially batik sales. Researchers will use a combination of K-Means method with elbow to improve efficient and effective k-means performance in processing large amounts of data. K-Means Clustering is a localized optimization method that is sensitive to the selection of the starting position from the midpoint of the cluster. So choosing the starting position from the midpoint of a bad cluster will result in K-Means Clustering algorithm resulting in high errors and poor cluster results. The K-means algorithm has problems in determining the best number of clusters. So Elbow looks for the best number of clusters on the K-means method. Based on the results obtained from the process in determining the best number of clusters with elbow method can produce the same number of clusters K on the amount of different data. The result of determining the best number of clusters with elbow method will be the default for characteristic process based on case study. Measurement of k-means value of k-means has resulted in the best clusters based on SSE values on 500 clusters of batik visitors. The result shows the cluster has a sharp decrease is at K = 3, so K as the cut-off point as the best cluster.

  20. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  1. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models the probabi......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...... the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace...... method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous...

  2. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  3. Clustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis

    Directory of Open Access Journals (Sweden)

    Mani D R

    2011-08-01

    Full Text Available Abstract Background Clustering is a widely applicable pattern recognition method for discovering groups of similar observations in data. While there are a large variety of clustering algorithms, very few of these can enforce constraints on the variation of attributes for data points included in a given cluster. In particular, a clustering algorithm that can limit variation within a cluster according to that cluster's position (centroid location can produce effective and optimal results in many important applications ranging from clustering of silicon pixels or calorimeter cells in high-energy physics to label-free liquid chromatography based mass spectrometry (LC-MS data analysis in proteomics and metabolomics. Results We present MEDEA (M-Estimator with DEterministic Annealing, an M-estimator based, new unsupervised algorithm that is designed to enforce position-specific constraints on variance during the clustering process. The utility of MEDEA is demonstrated by applying it to the problem of "peak matching"--identifying the common LC-MS peaks across multiple samples--in proteomic biomarker discovery. Using real-life datasets, we show that MEDEA not only outperforms current state-of-the-art model-based clustering methods, but also results in an implementation that is significantly more efficient, and hence applicable to much larger LC-MS data sets. Conclusions MEDEA is an effective and efficient solution to the problem of peak matching in label-free LC-MS data. The program implementing the MEDEA algorithm, including datasets, clustering results, and supplementary information is available from the author website at http://www.hephy.at/user/fru/medea/.

  4. Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.

    Science.gov (United States)

    Ben-Sasson, Ayelet; Podoly, Tamar Yonit

    2017-02-01

    Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a

  5. Clinical evaluation of nonsyndromic dental anomalies in Dravidian population: A cluster sample analysis

    OpenAIRE

    Yamunadevi, Andamuthu; Selvamani, M.; Vinitha, V.; Srivandhana, R.; Balakrithiga, M.; Prabhu, S.; Ganapathy, N.

    2015-01-01

    Aim: To record the prevalence rate of dental anomalies in Dravidian population and analyze the percentage of individual anomalies in the population. Methodology: A cluster sample analysis was done, where 244 subjects studying in a dental institution were all included and analyzed for occurrence of dental anomalies by clinical examination, excluding third molars from analysis. Results: 31.55% of the study subjects had dental anomalies and shape anomalies were more prevalent (22.1%), followed b...

  6. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  7. Planck intermediate results. X. Physics of the hot gas in the Coma cluster

    DEFF Research Database (Denmark)

    Planck Collaboration,; Ade, P. A. R.; Aghanim, N.

    2013-01-01

    We present an analysis of Planck satellite data on the Coma Cluster observed via the Sunyaev-Zeldovich effect. Planck is able, for the first time, to detect SZ emission up to r ~ 3 X R_500. We test previously proposed models for the pressure distribution in clusters against the azimuthally averaged...... data. We find that the Arnaud et al. universal pressure profile does not fit Coma, and that their pressure profile for merging systems provides a good fit of the data only at rR_500 than the mean pressure profile predicted by the simulations. The Planck image shows significant local steepening of the y...

  8. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  9. Coping profiles, perceived stress and health-related behaviors: a cluster analysis approach.

    Science.gov (United States)

    Doron, Julie; Trouillet, Raphael; Maneveau, Anaïs; Ninot, Grégory; Neveu, Dorine

    2015-03-01

    Using cluster analytical procedure, this study aimed (i) to determine whether people could be differentiated on the basis of coping profiles (or unique combinations of coping strategies); and (ii) to examine the relationships between these profiles and perceived stress and health-related behaviors. A sample of 578 French students (345 females, 233 males; M(age)= 21.78, SD(age)= 2.21) completed the Perceived Stress Scale-14 ( Bruchon-Schweitzer, 2002), the Brief COPE ( Muller and Spitz, 2003) and a series of items measuring health-related behaviors. A two-phased cluster analytic procedure (i.e. hierarchical and non-hierarchical-k-means) was employed to derive clusters of coping strategy profiles. The results yielded four distinctive coping profiles: High Copers, Adaptive Copers, Avoidant Copers and Low Copers. The results showed that clusters differed significantly in perceived stress and health-related behaviors. High Copers and Avoidant Copers displayed higher levels of perceived stress and engaged more in unhealthy behavior, compared with Adaptive Copers and Low Copers who reported lower levels of stress and engaged more in healthy behaviors. These findings suggested that individuals' relative reliance on some strategies and de-emphasis on others may be a more advantageous way of understanding the manner in which individuals cope with stress. Therefore, cluster analysis approach may provide an advantage over more traditional statistical techniques by identifying distinct coping profiles that might best benefit from interventions. Future research should consider coping profiles to provide a deeper understanding of the relationships between coping strategies and health outcomes and to identify risk groups. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Dietary habits and physical activity: Results from cluster analysis and market basket analysis.

    Science.gov (United States)

    Liew, Hui-Peng

    2018-01-01

    The prevalence of obesity remains a major public health concern and there has been a significant increase in childhood obesity in the USA. This study seeks to uncover the major patterns of dietary habits in relation to physical activity, together with students' opinions about the quality of food in the school's cafeteria and vending machines. The empirical work of this study is based on the 2011 Healthy School Program (HSP) Evaluation. HSP assesses the demographic characteristics as well as the dietary habits and exercise patterns of a representative sample of elementary, middle, and high school students in the USA. Findings suggest that students assigned to different clusters have different eating habits, exercise patterns, weight status, weight management, and opinions about the quality of food in the school's cafeteria and vending machines. There is great variation in dietary profiles and lifestyle behaviors among students who identified themselves as either overweight or unsure about their weight status. Findings from this study may inform future interventions regarding how to promote student's healthy food choices when they are still in school. Health promotion initiatives should specifically target students with persistently unhealthier dietary profiles.

  11. FLOCK cluster analysis of mast cell event clustering by high-sensitivity flow cytometry predicts systemic mastocytosis.

    Science.gov (United States)

    Dorfman, David M; LaPlante, Charlotte D; Pozdnyakova, Olga; Li, Betty

    2015-11-01

    In our high-sensitivity flow cytometric approach for systemic mastocytosis (SM), we identified mast cell event clustering as a new diagnostic criterion for the disease. To objectively characterize mast cell gated event distributions, we performed cluster analysis using FLOCK, a computational approach to identify cell subsets in multidimensional flow cytometry data in an unbiased, automated fashion. FLOCK identified discrete mast cell populations in most cases of SM (56/75 [75%]) but only a minority of non-SM cases (17/124 [14%]). FLOCK-identified mast cell populations accounted for 2.46% of total cells on average in SM cases and 0.09% of total cells on average in non-SM cases (P < .0001) and were predictive of SM, with a sensitivity of 75%, a specificity of 86%, a positive predictive value of 76%, and a negative predictive value of 85%. FLOCK analysis provides useful diagnostic information for evaluating patients with suspected SM, and may be useful for the analysis of other hematopoietic neoplasms. Copyright© by the American Society for Clinical Pathology.

  12. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  13. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    Maslennikova, Yu S; Bochkarev, V V; Belashova, I A

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  14. Degradation Assessment and Fault Diagnosis for Roller Bearing Based on AR Model and Fuzzy Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Lingli Jiang

    2011-01-01

    Full Text Available This paper proposes a new approach combining autoregressive (AR model and fuzzy cluster analysis for bearing fault diagnosis and degradation assessment. AR model is an effective approach to extract the fault feature, and is generally applied to stationary signals. However, the fault vibration signals of a roller bearing are non-stationary and non-Gaussian. Aiming at this problem, the set of parameters of the AR model is estimated based on higher-order cumulants. Consequently, the AR parameters are taken as the feature vectors, and fuzzy cluster analysis is applied to perform classification and pattern recognition. Experiments analysis results show that the proposed method can be used to identify various types and severities of fault bearings. This study is significant for non-stationary and non-Gaussian signal analysis, fault diagnosis and degradation assessment.

  15. Identification and characterization of earthquake clusters: a comparative analysis for selected sequences in Italy

    Science.gov (United States)

    Peresan, Antonella; Gentili, Stefania

    2017-04-01

    Identification and statistical characterization of seismic clusters may provide useful insights about the features of seismic energy release and their relation to physical properties of the crust within a given region. Moreover, a number of studies based on spatio-temporal analysis of main-shocks occurrence require preliminary declustering of the earthquake catalogs. Since various methods, relying on different physical/statistical assumptions, may lead to diverse classifications of earthquakes into main events and related events, we aim to investigate the classification differences among different declustering techniques. Accordingly, a formal selection and comparative analysis of earthquake clusters is carried out for the most relevant earthquakes in North-Eastern Italy, as reported in the local OGS-CRS bulletins, compiled at the National Institute of Oceanography and Experimental Geophysics since 1977. The comparison is then extended to selected earthquake sequences associated with a different seismotectonic setting, namely to events that occurred in the region struck by the recent Central Italy destructive earthquakes, making use of INGV data. Various techniques, ranging from classical space-time windows methods to ad hoc manual identification of aftershocks, are applied for detection of earthquake clusters. In particular, a statistical method based on nearest-neighbor distances of events in space-time-energy domain, is considered. Results from clusters identification by the nearest-neighbor method turn out quite robust with respect to the time span of the input catalogue, as well as to minimum magnitude cutoff. The identified clusters for the largest events reported in North-Eastern Italy since 1977 are well consistent with those reported in earlier studies, which were aimed at detailed manual aftershocks identification. The study shows that the data-driven approach, based on the nearest-neighbor distances, can be satisfactorily applied to decompose the seismic

  16. The Flemish frozen-vegetable industry as an example of cluster analysis : Flanders Vegetable Valley

    NARCIS (Netherlands)

    Vanhaverbeke, W.P.M.; Larosse, J.; Winnen, W.; Hulsink, W.; Dons, J.J.M.

    2008-01-01

    In this contribution we present a strategic analysis of the cluster dynamics in the frozen-vegetable industry in Flanders (Belgium)1. The main purpose of this case is twofold. First, we determine the added value of using data about customer and supplier relationships in cluster analysis. Second, we

  17. Automated classification of mouse pup isolation syllables: from cluster analysis to an Excel based ‘mouse pup syllable classification calculator’

    Directory of Open Access Journals (Sweden)

    Jasmine eGrimsley

    2013-01-01

    Full Text Available Mouse pups vocalize at high rates when they are cold or isolated from the nest. The proportions of each syllable type produced carry information about disease state and are being used as behavioral markers for the internal state of animals. Manual classifications of these vocalizations identified ten syllable types based on their spectro-temporal features. However, manual classification of mouse syllables is time consuming and vulnerable to experimenter bias. This study uses an automated cluster analysis to identify acoustically distinct syllable types produced by CBA/CaJ mouse pups, and then compares the results to prior manual classification methods. The cluster analysis identified two syllable types, based on their frequency bands, that have continuous frequency-time structure, and two syllable types featuring abrupt frequency transitions. Although cluster analysis computed fewer syllable types than manual classification, the clusters represented well the probability distributions of the acoustic features within syllables. These probability distributions indicate that some of the manually classified syllable types are not statistically distinct. The characteristics of the four classified clusters were used to generate a Microsoft Excel-based mouse syllable classifier that rapidly categorizes syllables, with over a 90% match, into the syllable types determined by cluster analysis.

  18. Polar cap ion beams during periods of northward IMF: Cluster statistical results

    Directory of Open Access Journals (Sweden)

    R. Maggiolo

    2011-05-01

    Full Text Available Above the polar caps and during prolonged periods of northward IMF, the Cluster satellites detect upward accelerated ion beams with energies up to a few keV. They are associated with converging electric field structures indicating that the acceleration is caused by a quasi-static field-aligned electric field that can extend to altitudes higher than 7 RE (Maggiolo et al., 2006; Teste et al., 2007. Using the AMDA science analysis service provided by the Centre de Données de la Physique des Plasmas, we have been able to extract about 200 events of accelerated upgoing ion beams above the polar caps from the Cluster database. Most of these observations are taken at altitudes lower than 7 RE and in the Northern Hemisphere. We investigate the statistical properties of these ion beams. We analyze their geometry, the properties of the plasma populations and of the electric field inside and around the beams, as well as their dependence on solar wind and IMF conditions. We show that ~40 % of the ion beams are collocated with a relatively hot and isotropic plasma population. The density and temperature of the isotropic population are highly variable but suggest that this plasma originates from the plasma sheet. The ion beam properties do not change significantly when the isotropic, hot background population is present. Furthermore, during one single polar cap crossing by Cluster it is possible to detect upgoing ion beams both with and without an accompanying isotropic component. The analysis of the variation of the IMF BZ component prior to the detection of the beams indicates that the delay between a northward/southward turning of IMF and the appearance/disappearance of the beams is respectively ~2 h and 20 min. The observed electrodynamic characteristics of high altitude polar cap ion beams suggest that they are closely connected to polar cap auroral arcs. We discuss the implications of these Cluster observations above the polar cap on the magnetospheric

  19. Cluster analysis to evaluate stable chemical elements and physical-chemical parameters behavior on uranium mining waste

    International Nuclear Information System (INIS)

    Pereira, Wagner de Souza; Py Junior, Delcy de Azevedo; Goncalves, Simone; Kelecom, Alphonse; Morais, Gustavo Ferrari de; Campelo, Emanuele Lazzaretti Cordova; Dores, Luis Augusto de Carvalho Bresser

    2011-01-01

    The Ore Treating Unit (UTM, in portuguese) is a deactivated uranium mine. A cluster analysis was used to evaluate the behavior of stable chemical elements and physical-chemical parameters in their effluents. The utilization of the cluster analysis proved itself effective in the assessment, allowing the identification of groups of chemical elements, physical-chemical parameters and their joint analysis (elements and parameters). As a result we may assert, based on data analysis, that there is a strong link between calcium and magnesium and between aluminum and rare-earth oxides on UTM's effluents. Sulphate was also identified as strongly linked to total and dissolved solids, and those to electrical conductivity. There were other associations, but not so strongly linked. Further gathering, to seasonal evaluation, are required in order to confirm those analysis. Additional statistical analysis (factor analysis) must be used to try to identify the origin of the identified groups on this analysis. (author)

  20. Cluster analysis to evaluate stable chemical elements and physical-chemical parameters behavior on uranium mining waste

    Energy Technology Data Exchange (ETDEWEB)

    Pereira, Wagner de Souza; Py Junior, Delcy de Azevedo; Goncalves, Simone, E-mail: wspereira@inb.gov.br [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Coordenacao de Protecao Radiologica. Grupo Multidisciplinar de Radioprotecao; Kelecom, Alphonse [Universidade Federal Fluminense (UFF), Niteroi, RJ (Brazil). Inst. de Biologia. Lab. de Radiobiologia e Radiometria Pedro Lopes dos Santos; Morais, Gustavo Ferrari de; Campelo, Emanuele Lazzaretti Cordova [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Coordenacao de Desenvolvimento de Processos; Dores, Luis Augusto de Carvalho Bresser [Unidade de Tratamento de Minerio (UTM/INB), Pocos de Caldas, MG (Brazil). Gerencia de Descomissionamento

    2011-07-01

    The Ore Treating Unit (UTM, in portuguese) is a deactivated uranium mine. A cluster analysis was used to evaluate the behavior of stable chemical elements and physical-chemical parameters in their effluents. The utilization of the cluster analysis proved itself effective in the assessment, allowing the identification of groups of chemical elements, physical-chemical parameters and their joint analysis (elements and parameters). As a result we may assert, based on data analysis, that there is a strong link between calcium and magnesium and between aluminum and rare-earth oxides on UTM's effluents. Sulphate was also identified as strongly linked to total and dissolved solids, and those to electrical conductivity. There were other associations, but not so strongly linked. Further gathering, to seasonal evaluation, are required in order to confirm those analysis. Additional statistical analysis (factor analysis) must be used to try to identify the origin of the identified groups on this analysis. (author)

  1. Factor-cluster analysis and enrichment study of Mangrove sediments - An example from Mengkabong, Sabah

    International Nuclear Information System (INIS)

    Praveena, S.M.; Ahmed, A.; Radojevic, M.; Mohd Harun Abdullah; Aris, A.Z.

    2007-01-01

    This paper examines the tidal effects in the sediment of Mengkabong mangrove forest, Sabah. Generally, all the studied parameters showed high value at high tide compared to low tide. Factor-cluster analyses were adopted to allow the identification of controlling factors at high and low tides. Factor analysis extracted six controlling factors at high tide and seven controlling factors at low tide. Cluster analysis extracted two district clusters at high and low tides. The study showed that factor-cluster analysis application is a useful tool to single out the controlling factors at high and low tides. this will provide a basis for describing the tidal effects in the mangrove sediment. The salinity and electrical conductivity clusters as well as component loadings at high and low tide explained the tidal process where there is high contribution of seawater to mangrove sediments that controls the sediment chemistry. The geo accumulation index (T geo ) values suggest the mangrove sediments are having background concentrations for Al, Cu, Fe and Zn and unpolluted for Pb. (author)

  2. Modeling, Stability Analysis and Active Stabilization of Multiple DC-Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    ), and more especially during interconnection with other MGs, creating dc MG clusters. This paper develops a small signal model for dc MGs from the control point of view, in order to study stability analysis and investigate effects of CPLs and line impedances between the MGs on stability of these systems....... This model can be also used to synthesis and study dynamics of control loops in dc MGs and also dc MG clusters. An active stabilization method is proposed to be implemented as a dc active power filter (APF) inside the MGs in order to not only increase damping of dc MGs at the presence of CPLs but also...... to improve their stability while connecting to the other MGs. Simulation results are provided to evaluate the developed models and demonstrate the effectiveness of proposed active stabilization technique....

  3. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.; Hou, Siqing

    2017-01-01

    baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number

  4. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  5. Scientific Cluster Deployment and Recovery – Using puppet to simplify cluster management

    International Nuclear Information System (INIS)

    Hendrix, Val; Yao Yushu; Benjamin, Doug

    2012-01-01

    Deployment, maintenance and recovery of a scientific cluster, which has complex, specialized services, can be a time consuming task requiring the assistance of Linux system administrators, network engineers as well as domain experts. Universities and small institutions that have a part-time FTE with limited time for and knowledge of the administration of such clusters can be strained by such maintenance tasks. This current work is the result of an effort to maintain a data analysis cluster (DAC) with minimal effort by a local system administrator. The realized benefit is the scientist, who is the local system administrator, is able to focus on the data analysis instead of the intricacies of managing a cluster. Our work provides a cluster deployment and recovery process (CDRP) based on the puppet configuration engine allowing a part-time FTE to easily deploy and recover entire clusters with minimal effort. Puppet is a configuration management system (CMS) used widely in computing centers for the automatic management of resources. Domain experts use Puppet's declarative language to define reusable modules for service configuration and deployment. Our CDRP has three actors: domain experts, a cluster designer and a cluster manager. The domain experts first write the puppet modules for the cluster services. A cluster designer would then define a cluster. This includes the creation of cluster roles, mapping the services to those roles and determining the relationships between the services. Finally, a cluster manager would acquire the resources (machines, networking), enter the cluster input parameters (hostnames, IP addresses) and automatically generate deployment scripts used by puppet to configure it to act as a designated role. In the event of a machine failure, the originally generated deployment scripts along with puppet can be used to easily reconfigure a new machine. The cluster definition produced in our CDRP is an integral part of automating cluster deployment

  6. [Classification of gamblers from self-help groups using cluster analysis].

    Science.gov (United States)

    Meyer, G

    1991-01-01

    In an empirically based classification by cluster analysis of 437 gamblers from self-help groups five distinct homogeneous subgroups were determined on the basis of such characteristics as frequency of gambling of various kinds, function of gambling and sensation during gambling, symptoms of pathological gambling as well as personality characteristics. These can be characterized as: Pathological slot-machine gamblers with 1) an emotionally instable, depressive-aggressive personality structure and 2) an emotionally instable, depressive personality structure; 3) pathological gamblers on German-style slot-machines and 4) pathological gamblers on classical games of chance--both without conspicuous personality, and 5) gamblers on German-style slot-machines under a subjective strain. On the whole, the distinctions are due to psychological variables, the social data hardly differ. A comparison of the subgroups on the basis of variables regarding the course and result of treatment shows that the pathological gamblers with a conspicuous personality structure more often failed to reach, the goal of abstinence set by "Gamblers Anonymous" and instead report about an improvement of their gambling behaviour. On the other hand, the gamblers on German-style slot-machines who were under a subjective strain more often found it easier to stop gambling completely. The results of the cluster analysis are compared with clinical diagnostic classifications of gamblers who received out-patient or in-patient treatment as well as with empirical classifications of addicts, and first hypotheses of a differential therapy indication are being discussed.

  7. Behavioral Health Risk Profiles of Undergraduate University Students in England, Wales, and Northern Ireland: A Cluster Analysis.

    Science.gov (United States)

    El Ansari, Walid; Ssewanyana, Derrick; Stock, Christiane

    2018-01-01

    Limited research has explored clustering of lifestyle behavioral risk factors (BRFs) among university students. This study aimed to explore clustering of BRFs, composition of clusters, and the association of the clusters with self-rated health and perceived academic performance. We assessed (BRFs), namely tobacco smoking, physical inactivity, alcohol consumption, illicit drug use, unhealthy nutrition, and inadequate sleep, using a self-administered general Student Health Survey among 3,706 undergraduates at seven UK universities. A two-step cluster analysis generated: Cluster 1 (the high physically active and health conscious) with very high health awareness/consciousness, good nutrition, and physical activity (PA), and relatively low alcohol, tobacco, and other drug (ATOD) use. Cluster 2 (the abstinent) had very low ATOD use, high health awareness, good nutrition, and medium high PA. Cluster 3 (the moderately health conscious) included the highest regard for healthy eating, second highest fruit/vegetable consumption, and moderately high ATOD use. Cluster 4 (the risk taking) showed the highest ATOD use, were the least health conscious, least fruit consuming, and attached the least importance on eating healthy. Compared to the healthy cluster (Cluster 1), students in other clusters had lower self-rated health, and particularly, students in the risk taking cluster (Cluster 4) reported lower academic performance. These associations were stronger for men than for women. Of the four clusters, Cluster 4 had the youngest students. Our results suggested that prevention among university students should address multiple BRFs simultaneously, with particular focus on the younger students.

  8. Spermiogram and sperm head morphometry assessed by multivariate cluster analysis results during adolescence (12-18 years and the effect of varicocele

    Directory of Open Access Journals (Sweden)

    Fernando Vásquez

    2016-01-01

    Full Text Available This work evaluates sperm head morphometric characteristics in adolescents from 12 to 18 years of age, and the effect of varicocele. Volunteers between 150 and 224 months of age (mean 191, n = 87, who had reached oigarche by 12 years old, were recruited in the area of Barranquilla, Colombia. Morphometric analysis of sperm heads was performed with principal component (PC and discriminant analysis. Combining seminal fluid and sperm parameters provided five PCs: two related to sperm morphometry, one to sperm motility, and two to seminal fluid components. Discriminant analysis on the morphometric results of varicocele and nonvaricocele groups did not provide a useful classification matrix. Of the semen-related PCs, the most explanatory (40% was related to sperm motility. Two PCs, including sperm head elongation and size, were sufficient to evaluate sperm morphometric characteristics. Most of the morphometric variables were correlated with age, with an increase in size and decrease in the elongation of the sperm head. For head size, the entire sperm population could be divided into two morphometric subpopulations, SP1 and SP2, which did not change during adolescence. In general, for varicocele individuals, SP1 had larger and more elongated sperm heads than SP2, which had smaller and more elongated heads than in nonvaricocele men. In summary, sperm head morphometry assessed by CASA-Morph and multivariate cluster analysis provides a better comprehension of the ejaculate structure and possibly sperm function. Morphometric analysis provides much more information than data obtained from conventional semen analysis.

  9. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  10. Applying Clustering to Statistical Analysis of Student Reasoning about Two-Dimensional Kinematics

    Science.gov (United States)

    Springuel, R. Padraic; Wittman, Michael C.; Thompson, John R.

    2007-01-01

    We use clustering, an analysis method not presently common to the physics education research community, to group and characterize student responses to written questions about two-dimensional kinematics. Previously, clustering has been used to analyze multiple-choice data; we analyze free-response data that includes both sketches of vectors and…

  11. The CERN analysis facility-a PROOF cluster for day-one physics analysis

    International Nuclear Information System (INIS)

    Grosse-Oetringhaus, J F

    2008-01-01

    ALICE (A Large Ion Collider Experiment) at the LHC plans to use a PROOF cluster at CERN (CAF - CERN Analysis Facility) for analysis. The system is especially aimed at the prototyping phase of analyses that need a high number of development iterations and thus require a short response time. Typical examples are the tuning of cuts during the development of an analysis as well as calibration and alignment. Furthermore, the use of an interactive system with very fast response will allow ALICE to extract physics observables out of first data quickly. An additional use case is fast event simulation and reconstruction. A test setup consisting of 40 machines is used for evaluation since May 2006. The PROOF system enables the parallel processing and xrootd the access to files distributed on the test cluster. An automatic staging system for files either catalogued in the ALICE file catalog or stored in the CASTOR mass storage system has been developed. The current setup and ongoing development towards disk quotas and CPU fairshare are described. Furthermore, the integration of PROOF into ALICE's software framework (AliRoot) is discussed

  12. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    Science.gov (United States)

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  13. GibbsCluster: unsupervised clustering and alignment of peptide sequences

    DEFF Research Database (Denmark)

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-01-01

    motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large......-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0....

  14. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  15. Phenotypes of asthma in low-income children and adolescents: cluster analysis.

    Science.gov (United States)

    Cabral, Anna Lucia Barros; Sousa, Andrey Wirgues; Mendes, Felipe Augusto Rodrigues; Carvalho, Celso Ricardo Fernandes de

    2017-01-01

    Studies characterizing asthma phenotypes have predominantly included adults or have involved children and adolescents in developed countries. Therefore, their applicability in other populations, such as those of developing countries, remains indeterminate. Our objective was to determine how low-income children and adolescents with asthma in Brazil are distributed across a cluster analysis. We included 306 children and adolescents (6-18 years of age) with a clinical diagnosis of asthma and under medical treatment for at least one year of follow-up. At enrollment, all the patients were clinically stable. For the cluster analysis, we selected 20 variables commonly measured in clinical practice and considered important in defining asthma phenotypes. Variables with high multicollinearity were excluded. A cluster analysis was applied using a twostep agglomerative test and log-likelihood distance measure. Three clusters were defined for our population. Cluster 1 (n = 94) included subjects with normal pulmonary function, mild eosinophil inflammation, few exacerbations, later age at asthma onset, and mild atopy. Cluster 2 (n = 87) included those with normal pulmonary function, a moderate number of exacerbations, early age at asthma onset, more severe eosinophil inflammation, and moderate atopy. Cluster 3 (n = 108) included those with poor pulmonary function, frequent exacerbations, severe eosinophil inflammation, and severe atopy. Asthma was characterized by the presence of atopy, number of exacerbations, and lung function in low-income children and adolescents in Brazil. The many similarities with previous cluster analyses of phenotypes indicate that this approach shows good generalizability. Estudos que caracterizam fenótipos de asma predominantemente incluem adultos ou foram realizados em crianças e adolescentes de países desenvolvidos; portanto, sua aplicabilidade em outras populações, tais como as de países em desenvolvimento, permanece indeterminada. Nosso

  16. Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses.

    Science.gov (United States)

    Kumpf, Alexander; Tost, Bianca; Baumgart, Marlene; Riemer, Michael; Westermann, Rudiger; Rautenhaus, Marc

    2018-01-01

    In meteorology, cluster analysis is frequently used to determine representative trends in ensemble weather predictions in a selected spatio-temporal region, e.g., to reduce a set of ensemble members to simplify and improve their analysis. Identified clusters (i.e., groups of similar members), however, can be very sensitive to small changes of the selected region, so that clustering results can be misleading and bias subsequent analyses. In this article, we - a team of visualization scientists and meteorologists-deliver visual analytics solutions to analyze the sensitivity of clustering results with respect to changes of a selected region. We propose an interactive visual interface that enables simultaneous visualization of a) the variation in composition of identified clusters (i.e., their robustness), b) the variability in cluster membership for individual ensemble members, and c) the uncertainty in the spatial locations of identified trends. We demonstrate that our solution shows meteorologists how representative a clustering result is, and with respect to which changes in the selected region it becomes unstable. Furthermore, our solution helps to identify those ensemble members which stably belong to a given cluster and can thus be considered similar. In a real-world application case we show how our approach is used to analyze the clustering behavior of different regions in a forecast of "Tropical Cyclone Karl", guiding the user towards the cluster robustness information required for subsequent ensemble analysis.

  17. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  18. Co-clustering Analysis of Weblogs Using Bipartite Spectral Projection Approach

    DEFF Research Database (Denmark)

    Xu, Guandong; Zong, Yu; Dolog, Peter

    2010-01-01

    Web clustering is an approach for aggregating Web objects into various groups according to underlying relationships among them. Finding co-clusters of Web objects is an interesting topic in the context of Web usage mining, which is able to capture the underlying user navigational interest...... and content preference simultaneously. In this paper we will present an algorithm using bipartite spectral clustering to co-cluster Web users and pages. The usage data of users visiting Web sites is modeled as a bipartite graph and the spectral clustering is then applied to the graph representation of usage...... data. The proposed approach is evaluated by experiments performed on real datasets, and the impact of using various clustering algorithms is also investigated. Experimental results have demonstrated the employed method can effectively reveal the subset aggregates of Web users and pages which...

  19. An approach based on genetic algorithms and DFT for studying clusters: (H2O) n (2 ≤ n ≤ 13) cluster analysis

    International Nuclear Information System (INIS)

    Sabato de Abreu e Silva, Elcio; Anderson Duarte, Helio; Belchior, Jadson Claudio

    2006-01-01

    The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H 2 O) n (2 ≤ n ≤ 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT R M) and non rigid (DFT N RM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT R M and DFT N RM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H 2 O) n (2 ≤ n ≤ 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol -1 per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner

  20. An approach based on genetic algorithms and DFT for studying clusters: (H{sub 2}O) {sub n} (2 {<=} n {<=} 13) cluster analysis

    Energy Technology Data Exchange (ETDEWEB)

    Sabato de Abreu e Silva, Elcio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil); Anderson Duarte, Helio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil); Belchior, Jadson Claudio [Departamento de Quimica - ICEx, Universidade Federal de Minas Gerais, Av. Antonio Carlos 6627, Pampulha (31.270-901) Belo Horizonte, Minas Gerias (Brazil)], E-mail: jadson@ufmg.br

    2006-04-21

    The present work proposes the application of a genetic algorithm (GA) for determining global minima to be used as seeds for a higher level ab initio method analysis such as density function theory (DFT). Water clusters ((H{sub 2}O) {sub n} (2 {<=} n {<=} 13)) are used as a test case and for the initial guesses four empirical potentials (TIP3P, TIP4P, TIP5P and ST2) were considered for the GA calculations. Two types of analysis were performed namely rigid (DFT{sub R}M) and non rigid (DFT{sub N}RM) molecules for the corresponding structures and energies. For the DFT analysis, the PBE exchange correlation functional and the large basis set A-PVTZ have been used. All structures and their respective energies calculated through the GA method, DFT{sub R}M and DFT{sub N}RM are compared and discussed. The proposed methodology showed to be very efficient in order to have quasi accurate global minima on the level of ab initio calculations and the data are discussed in the light of previously published results with particular attention to ((H{sub 2}O) {sub n} (2 {<=} n {<=} 13)) clusters. The results suggest that the stabilization energy error for the empirical potentials used are additive with respect to the cluster size, roughly 0.5 kcal mol{sup -1} per water molecule after ZPE correction. Finally, the approach of using GA/empirical potential structures as starting point for ab initio optimization methods showed to be a computationally manageable strategy to explore the potential energy surface of large systems at quantum level. In conclusion, this work proposes an alternative approach to accurately study properties of larger systems in a very efficient manner.

  1. Herd Clustering: A synergistic data clustering approach using collective intelligence

    KAUST Repository

    Wong, Kachun

    2014-10-01

    Traditional data mining methods emphasize on analytical abilities to decipher data, assuming that data are static during a mining process. We challenge this assumption, arguing that we can improve the analysis by vitalizing data. In this paper, this principle is used to develop a new clustering algorithm. Inspired by herd behavior, the clustering method is a synergistic approach using collective intelligence called Herd Clustering (HC). The novel part is laid in its first stage where data instances are represented by moving particles. Particles attract each other locally and form clusters by themselves as shown in the case studies reported. To demonstrate its effectiveness, the performance of HC is compared to other state-of-the art clustering methods on more than thirty datasets using four performance metrics. An application for DNA motif discovery is also conducted. The results support the effectiveness of HC and thus the underlying philosophy. © 2014 Elsevier B.V.

  2. Measuring age differences among globular clusters having similar metallicities - A new method and first results

    International Nuclear Information System (INIS)

    Vandenberg, D.A.; Bolte, M.; Stetson, P.B.

    1990-01-01

    A color-difference technique for estimating the relative ages of globular clusters with similar chemical compositions on the basis of their CM diagrams is described and demonstrated. The theoretical basis and implementation of the procedure are explained, and results for groups of globular clusters with m/H = about -2, -1.6, and -1.3, and for two special cases (Palomar 12 and NGC 5139) are presented in extensive tables and graphs and discussed in detail. It is found that the more metal-deficient globular clusters are nearly coeval (differences less than 0.5 Gyr), whereas the most metal-rich globular clusters exhibit significant age differences (about 2 Gyr). This result is shown to contradict Galactic evolution models postulating halo collapse in less than a few times 100 Myr. 77 refs

  3. The use of a cluster analysis in across herd genetic evaluation for ...

    African Journals Online (AJOL)

    To investigate the possibility of a genotype x environment interaction in Bonsmara cattle, a cluster analysis was performed on weaning weight records of 72 811 Bonsmara calves, the progeny of 1 434 sires and 24 186 dams in 35 herds. The following environmental factors were used to classify herds into clusters: solution ...

  4. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  5. Cluster analysis of Helicobacter pylori genomic DNA fingerprints suggests gastroduodenal disease-specific associations.

    Science.gov (United States)

    Go, M F; Chan, K Y; Versalovic, J; Koeuth, T; Graham, D Y; Lupski, J R

    1995-07-01

    Helicobacter pylori infection is now accepted as the most common cause of chronic active gastritis and peptic ulcer disease. The etiologies of many infectious diseases have been attributed to specific or clonal strains of bacterial pathogens. Polymerase chain reaction (PCR) amplification of DNA between repetitive DNA sequences, REP elements (REP-PCR), has been utilized to generate DNA fingerprints to examine similarity among strains within a bacterial species. Genomic DNA from H. pylori isolates obtained from 70 individuals (39 duodenal ulcers and 31 simple gastritis) was PCR-amplified using consensus probes to repetitive DNA elements. The H. pylori DNA fingerprints were analyzed for similarity and correlated with disease presentation using the NTSYS-pc computer program. Each H. pylori strain had a distinct DNA fingerprint except for two pairs. Single-colony DNA fingerprints of H. pylori from the same patient were identical, suggesting that each patient harbors a single strain. Computer-assisted cluster analysis of the REP-PCR DNA fingerprints showed two large clusters of isolates, one associated with simple gastritis and the other with duodenal ulcer disease. Cluster analysis of REP-PCR DNA fingerprints of H. pylori strains suggests that duodenal ulcer isolates, as a group, are more similar to one another and different from gastritis isolates. These results suggest that disease-specific strains may exist.

  6. Hyperplane distance neighbor clustering based on local discriminant analysis for complex chemical processes monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Chunhong; Xiao, Shaoqing; Gu, Xiaofeng [Jiangnan University, Wuxi (China)

    2014-11-15

    The collected training data often include both normal and faulty samples for complex chemical processes. However, some monitoring methods, such as partial least squares (PLS), principal component analysis (PCA), independent component analysis (ICA) and Fisher discriminant analysis (FDA), require fault-free data to build the normal operation model. These techniques are applicable after the preliminary step of data clustering is applied. We here propose a novel hyperplane distance neighbor clustering (HDNC) based on the local discriminant analysis (LDA) for chemical process monitoring. First, faulty samples are separated from normal ones using the HDNC method. Then, the optimal subspace for fault detection and classification can be obtained using the LDA approach. The proposed method takes the multimodality within the faulty data into account, and thus improves the capability of process monitoring significantly. The HDNC-LDA monitoring approach is applied to two simulation processes and then compared with the conventional FDA based on the K-nearest neighbor (KNN-FDA) method. The results obtained in two different scenarios demonstrate the superiority of the HDNC-LDA approach in terms of fault detection and classification accuracy.

  7. Hyperplane distance neighbor clustering based on local discriminant analysis for complex chemical processes monitoring

    International Nuclear Information System (INIS)

    Lu, Chunhong; Xiao, Shaoqing; Gu, Xiaofeng

    2014-01-01

    The collected training data often include both normal and faulty samples for complex chemical processes. However, some monitoring methods, such as partial least squares (PLS), principal component analysis (PCA), independent component analysis (ICA) and Fisher discriminant analysis (FDA), require fault-free data to build the normal operation model. These techniques are applicable after the preliminary step of data clustering is applied. We here propose a novel hyperplane distance neighbor clustering (HDNC) based on the local discriminant analysis (LDA) for chemical process monitoring. First, faulty samples are separated from normal ones using the HDNC method. Then, the optimal subspace for fault detection and classification can be obtained using the LDA approach. The proposed method takes the multimodality within the faulty data into account, and thus improves the capability of process monitoring significantly. The HDNC-LDA monitoring approach is applied to two simulation processes and then compared with the conventional FDA based on the K-nearest neighbor (KNN-FDA) method. The results obtained in two different scenarios demonstrate the superiority of the HDNC-LDA approach in terms of fault detection and classification accuracy

  8. Analyzing patients' values by applying cluster analysis and LRFM model in a pediatric dental clinic in Taiwan.

    Science.gov (United States)

    Wu, Hsin-Hung; Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs.

  9. Analyzing Patients' Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    Science.gov (United States)

    Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  10. Analyzing Patients’ Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    Directory of Open Access Journals (Sweden)

    Hsin-Hung Wu

    2014-01-01

    Full Text Available This study combines cluster analysis and LRFM (length, recency, frequency, and monetary model in a pediatric dental clinic in Taiwan to analyze patients’ values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients’ needs.

  11. Clustering analysis of line indices for LAMOST spectra with AstroStat

    Science.gov (United States)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  12. Semiparametric Bayesian analysis of accelerated failure time models with cluster structures.

    Science.gov (United States)

    Li, Zhaonan; Xu, Xinyi; Shen, Junshan

    2017-11-10

    In this paper, we develop a Bayesian semiparametric accelerated failure time model for survival data with cluster structures. Our model allows distributional heterogeneity across clusters and accommodates their relationships through a density ratio approach. Moreover, a nonparametric mixture of Dirichlet processes prior is placed on the baseline distribution to yield full distributional flexibility. We illustrate through simulations that our model can greatly improve estimation accuracy by effectively pooling information from multiple clusters, while taking into account the heterogeneity in their random error distributions. We also demonstrate the implementation of our method using analysis of Mayo Clinic Trial in Primary Biliary Cirrhosis. Copyright © 2017 John Wiley & Sons, Ltd.

  13. The analysis for energy distribution and biological effects of the clusters from electrons in the tissue equivalent material

    International Nuclear Information System (INIS)

    Zhang Wenzhong; Guo Yong; Luo Yisheng; Wang Yong

    2004-01-01

    Objective: To study energy distribution of the clusters from electrons in the tissue equivalent material, and discuss the important aspects of these clusters on inducing biological effects. Methods: Based on the physical mechanism for electrons interacting with tissue equivalent material, the Monte Carlo (MC) method was used. The electron tracks were lively simulated on an event-by-event (ionization, excitation, elastic scattering, Auger electron emission) basis in the material. The relevant conclusions were drawn from the statistic analysis of these events. Results: The electrons will deposit their energy in the form (30%) of cluster in passing through tissue equivalent material, and most clusters (80%) have the energy amount of more than 50 eV. The cluster density depends on its diameter and energy of electrons, and the deposited energy in the cluster depends on the type and energy of radiation. Conclusion: The deposited energy in cluster is the most important factor in inducing all sort of lesions on DNA molecules in tissue cells

  14. Early results from the Whisper instrument on Cluster: An overview

    DEFF Research Database (Denmark)

    Decreau, P.M.E.; Fergeau, P.; Krasnoselskikh, V.

    2001-01-01

    The Whisper instrument yields two data sets: (i) the electron density determined via the relaxation sounder, and (ii) the spectrum of natural plasma emissions in the frequency band 2-80 kHz. Both data sets allow for the three-dimensional exploration of the magnetosphere by the Cluster mission...... the drift velocity of density structures. Wave observations are also of crucial interest for studying small-scale structures, as demonstrated in an example in the fore-shock region. Early results from the Whisper instrument are very encouraging, and demonstrate that the four-point Cluster measurements...... largely overcomes the limited telemetry allocation. The natural emissions are usually related to the plasma frequency, as identified by the sounder, and the combination of an active sounding operation and a passive survey operation provides a time resolution for the total density determination of 2.2 s...

  15. Female cluster headache in the United States of America: what are the gender differences? Results from the United States Cluster Headache Survey.

    Science.gov (United States)

    Rozen, Todd D; Fishman, Royce S

    2012-06-15

    To present results from the United States Cluster Headache Survey regarding gender differences in cluster headache demographics, clinical characteristics, diagnostic delay, triggers, treatment response and personal burden. Very few studies have looked at the gender differences in cluster headache presentation. The United States Cluster Headache Survey is the largest study of cluster headache sufferers ever completed in the United States and it is also the largest study of female cluster headache patients ever presented. The total survey consisted of 187 multiple choice questions which dealt with various issues related to cluster headache including: demographics, clinical characteristics, concomitant medical conditions, family history, triggers, smoking history, diagnosis, treatment response and personal burden. A group of questions were specifically targeted to female cluster headache patients. The survey was placed on a website from October to December 2008. For all survey responders the diagnosis of cluster headache needed to be made by a neurologist but there was no validation of the headache diagnosis by the authors. 1134 individuals completed the survey (816 male, 318 female). Key Points that define the differences between female and male cluster headache include: a. Age of onset: women develop cluster headache at an earlier age than men and are more likely to develop a second peak of cluster headache onset after 50 years of age. b. Family history: woman cluster headache sufferers are more likely to have a family history of both cluster headache and migraine and have an increased familial risk of Parkinson's disease. c. Comorbid conditions: female cluster headaches sufferers are significantly more likely to experience depression and have asthma than males. d. Aura issues: aura with cluster headache is equally common in both sexes, but aura duration is shorter in women. Women are much more likely to experience sensory, language and brainstem auras. e. Pain

  16. bcl::Cluster : A method for clustering biological molecules coupled with visualization in the Pymol Molecular Graphics System.

    Science.gov (United States)

    Alexander, Nathan; Woetzel, Nils; Meiler, Jens

    2011-02-01

    Clustering algorithms are used as data analysis tools in a wide variety of applications in Biology. Clustering has become especially important in protein structure prediction and virtual high throughput screening methods. In protein structure prediction, clustering is used to structure the conformational space of thousands of protein models. In virtual high throughput screening, databases with millions of drug-like molecules are organized by structural similarity, e.g. common scaffolds. The tree-like dendrogram structure obtained from hierarchical clustering can provide a qualitative overview of the results, which is important for focusing detailed analysis. However, in practice it is difficult to relate specific components of the dendrogram directly back to the objects of which it is comprised and to display all desired information within the two dimensions of the dendrogram. The current work presents a hierarchical agglomerative clustering method termed bcl::Cluster. bcl::Cluster utilizes the Pymol Molecular Graphics System to graphically depict dendrograms in three dimensions. This allows simultaneous display of relevant biological molecules as well as additional information about the clusters and the members comprising them.

  17. Analysis of the dynamical cluster approximation for the Hubbard model

    OpenAIRE

    Aryanpour, K.; Hettler, M. H.; Jarrell, M.

    2002-01-01

    We examine a central approximation of the recently introduced Dynamical Cluster Approximation (DCA) by example of the Hubbard model. By both analytical and numerical means we study non-compact and compact contributions to the thermodynamic potential. We show that approximating non-compact diagrams by their cluster analogs results in a larger systematic error as compared to the compact diagrams. Consequently, only the compact contributions should be taken from the cluster, whereas non-compact ...

  18. Input frequency and lexical variability in phonological development: a survival analysis of word-initial cluster production.

    Science.gov (United States)

    Ota, Mitsuhiko; Green, Sam J

    2013-06-01

    Although it has been often hypothesized that children learn to produce new sound patterns first in frequently heard words, the available evidence in support of this claim is inconclusive. To re-examine this question, we conducted a survival analysis of word-initial consonant clusters produced by three children in the Providence Corpus (0 ; 11-4 ; 0). The analysis took account of several lexical factors in addition to lexical input frequency, including the age of first production, production frequency, neighborhood density and number of phonemes. The results showed that lexical input frequency was a significant predictor of the age at which the accuracy level of cluster production in each word first reached 80%. The magnitude of the frequency effect differed across cluster types. Our findings indicate that some of the between-word variance found in the development of sound production can indeed be attributed to the frequency of words in the child's ambient language.

  19. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  20. Analysis of plasmaspheric plumes: CLUSTER and IMAGE observations

    Directory of Open Access Journals (Sweden)

    F. Darrouzet

    2006-07-01

    Full Text Available Plasmaspheric plumes have been routinely observed by CLUSTER and IMAGE. The CLUSTER mission provides high time resolution four-point measurements of the plasmasphere near perigee. Total electron density profiles have been derived from the electron plasma frequency identified by the WHISPER sounder supplemented, in-between soundings, by relative variations of the spacecraft potential measured by the electric field instrument EFW; ion velocity is also measured onboard these satellites. The EUV imager onboard the IMAGE spacecraft provides global images of the plasmasphere with a spatial resolution of 0.1 RE every 10 min; such images acquired near apogee from high above the pole show the geometry of plasmaspheric plumes, their evolution and motion. We present coordinated observations of three plume events and compare CLUSTER in-situ data with global images of the plasmasphere obtained by IMAGE. In particular, we study the geometry and the orientation of plasmaspheric plumes by using four-point analysis methods. We compare several aspects of plume motion as determined by different methods: (i inner and outer plume boundary velocity calculated from time delays of this boundary as observed by the wave experiment WHISPER on the four spacecraft, (ii drift velocity measured by the electron drift instrument EDI onboard CLUSTER and (iii global velocity determined from successive EUV images. These different techniques consistently indicate that plasmaspheric plumes rotate around the Earth, with their foot fully co-rotating, but with their tip rotating slower and moving farther out.

  1. Confidence in Government and Attitudes toward Bribery: A Country-Cluster Analysis of Demographic and Religiosity Perspectives

    Directory of Open Access Journals (Sweden)

    Serkan Benk

    2017-01-01

    Full Text Available In this study, we try to classify the countries by the levels of confidence in government and attitudes toward accepting bribery by using the data of the sixth wave (2010–2014 of the World Values Survey (WVS. We are also interested in which demographic, attitudinal, and religiosity variables affect each class of countries. For these purposes cluster analysis, linear regression analysis, and ordered logistic regression analysis were used. The study found that countries could be grouped into two clusters which had varying levels of opposition to bribe taking and confidence in government. Another finding was that certain demographic, attitudinal, and religiosity variables that were significant in one cluster might not be significant in another cluster.

  2. Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation

    Directory of Open Access Journals (Sweden)

    Tushar H Jaware

    2013-10-01

    Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.

  3. Identifying clinical course patterns in SMS data using cluster analysis

    DEFF Research Database (Denmark)

    Kent, Peter; Kongsted, Alice

    2012-01-01

    ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there are alternative ways of managing SMS data and many different methods...

  4. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2006-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  5. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2008-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  6. Functional Interference Clusters in Cancer Patients With Bone Metastases: A Secondary Analysis of RTOG 9714

    International Nuclear Information System (INIS)

    Chow, Edward; James, Jennifer; Barsevick, Andrea; Hartsell, William; Ratcliffe, Sarah; Scarantino, Charles; Ivker, Robert; Roach, Mack; Suh, John; Petersen, Ivy; Konski, Andre; Demas, William; Bruner, Deborah

    2010-01-01

    Purpose: To explore the relationships (clusters) among the functional interference items in the Brief Pain Inventory (BPI) in patients with bone metastases. Methods: Patients enrolled in the Radiation Therapy Oncology Group (RTOG) 9714 bone metastases study were eligible. Patients were assessed at baseline and 4, 8, and 12 weeks after randomization for the palliative radiotherapy with the BPI, which consists of seven functional items: general activity, mood, walking ability, normal work, relations with others, sleep, and enjoyment of life. Principal component analysis with varimax rotation was used to determine the clusters between the functional items at baseline and the follow-up. Cronbach's alpha was used to determine the consistency and reliability of each cluster at baseline and follow-up. Results: There were 448 male and 461 female patients, with a median age of 67 years. There were two functional interference clusters at baseline, which accounted for 71% of the total variance. The first cluster (physical interference) included normal work and walking ability, which accounted for 58% of the total variance. The second cluster (psychosocial interference) included relations with others and sleep, which accounted for 13% of the total variance. The Cronbach's alpha statistics were 0.83 and 0.80, respectively. The functional clusters changed at week 12 in responders but persisted through week 12 in nonresponders. Conclusion: Palliative radiotherapy is effective in reducing bone pain. Functional interference component clusters exist in patients treated for bone metastases. These clusters changed over time in this study, possibly attributable to treatment. Further research is needed to examine these effects.

  7. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  8. Challenges And Results of the Applications of Fuzzy Logic in the Classification of Rich Galaxy Clusters

    Science.gov (United States)

    Girola Schneider, R.

    2017-07-01

    The fuzzy logic is a branch of the artificial intelligence founded on the concept that everything is a matter of degree. It intends to create mathematical approximations on the resolution of certain types of problems. In addition, it aims to produce exact results obtained from imprecise data, for which it is particularly useful for electronic and computer applications. This enables it to handle vague or unspecific information when certain parts of a system are unknown or ambiguous and, therefore, they cannot be measured in a reliable manner. Also, when the variation of a variable can produce an alteration on the others The main focus of this paper is to prove the importance of these techniques formulated from a theoretical analysis on its application on ambiguous situations in the field of the rich clusters of galaxies. The purpose is to show its applicability in the several classification systems proposed for the rich clusters, which are based on criteria such as the level of richness of the cluster, the distribution of the brightest galaxies, whether there are signs of type-cD galaxies or not or the existence of sub-clusters. Fuzzy logic enables the researcher to work with "imprecise" information implementing fuzzy sets and combining rules to define actions. The control systems based on fuzzy logic join input variables that are defined in terms of fuzzy sets through rule groups that produce one or several output values of the system under study. From this context, the application of the fuzzy logic's techniques approximates the solution of the mathematical models in abstractions about the rich galaxy cluster classification of physical properties in order to solve the obscurities that must be confronted by an investigation group in order to make a decision.

  9. Using Cluster Analysis to Compartmentalize a Large Managed Wetland Based on Physical, Biological, and Climatic Geospatial Attributes.

    Science.gov (United States)

    Hahus, Ian; Migliaccio, Kati; Douglas-Mankin, Kyle; Klarenberg, Geraldine; Muñoz-Carpena, Rafael

    2018-04-27

    Hierarchical and partitional cluster analyses were used to compartmentalize Water Conservation Area 1, a managed wetland within the Arthur R. Marshall Loxahatchee National Wildlife Refuge in southeast Florida, USA, based on physical, biological, and climatic geospatial attributes. Single, complete, average, and Ward's linkages were tested during the hierarchical cluster analyses, with average linkage providing the best results. In general, the partitional method, partitioning around medoids, found clusters that were more evenly sized and more spatially aggregated than those resulting from the hierarchical analyses. However, hierarchical analysis appeared to be better suited to identify outlier regions that were significantly different from other areas. The clusters identified by geospatial attributes were similar to clusters developed for the interior marsh in a separate study using water quality attributes, suggesting that similar factors have influenced variations in both the set of physical, biological, and climatic attributes selected in this study and water quality parameters. However, geospatial data allowed further subdivision of several interior marsh clusters identified from the water quality data, potentially indicating zones with important differences in function. Identification of these zones can be useful to managers and modelers by informing the distribution of monitoring equipment and personnel as well as delineating regions that may respond similarly to future changes in management or climate.

  10. The Multimorbidity Cluster Analysis Tool: Identifying Combinations and Permutations of Multiple Chronic Diseases Using a Record-Level Computational Analysis

    Directory of Open Access Journals (Sweden)

    Kathryn Nicholson

    2017-12-01

    Full Text Available Introduction: Multimorbidity, or the co-occurrence of multiple chronic health conditions within an individual, is an increasingly dominant presence and burden in modern health care systems.  To fully capture its complexity, further research is needed to uncover the patterns and consequences of these co-occurring health states.  As such, the Multimorbidity Cluster Analysis Tool and the accompanying Multimorbidity Cluster Analysis Toolkit have been created to allow researchers to identify distinct clusters that exist within a sample of participants or patients living with multimorbidity.  Development: The Tool and Toolkit were developed at Western University in London, Ontario, Canada.  This open-access computational program (JAVA code and executable file was developed and tested to support an analysis of thousands of individual records and up to 100 disease diagnoses or categories.  Application: The computational program can be adapted to the methodological elements of a research project, including type of data, type of chronic disease reporting, measurement of multimorbidity, sample size and research setting.  The computational program will identify all existing, and mutually exclusive, combinations and permutations within the dataset.  An application of this computational program is provided as an example, in which more than 75,000 individual records and 20 chronic disease categories resulted in the detection of 10,411 unique combinations and 24,647 unique permutations among female and male patients.  Discussion: The Tool and Toolkit are now available for use by researchers interested in exploring the complexities of multimorbidity.  Its careful use, and the comparison between results, will be valuable additions to the nuanced understanding of multimorbidity.

  11. Electricity Consumption Clustering Using Smart Meter Data

    Directory of Open Access Journals (Sweden)

    Alexander Tureczek

    2018-04-01

    Full Text Available Electricity smart meter consumption data is enabling utilities to analyze consumption information at unprecedented granularity. Much focus has been directed towards consumption clustering for diversifying tariffs; through modern clustering methods, cluster analyses have been performed. However, the clusters developed exhibit a large variation with resulting shadow clusters, making it impossible to truly identify the individual clusters. Using clearly defined dwelling types, this paper will present methods to improve clustering by harvesting inherent structure from the smart meter data. This paper clusters domestic electricity consumption using smart meter data from the Danish city of Esbjerg. Methods from time series analysis and wavelets are applied to enable the K-Means clustering method to account for autocorrelation in data and thereby improve the clustering performance. The results show the importance of data knowledge and we identify sub-clusters of consumption within the dwelling types and enable K-Means to produce satisfactory clustering by accounting for a temporal component. Furthermore our study shows that careful preprocessing of the data to account for intrinsic structure enables better clustering performance by the K-Means method.

  12. Going beyond clustering in MD trajectory analysis: an application to villin headpiece folding.

    Directory of Open Access Journals (Sweden)

    Aruna Rajan

    2010-04-01

    Full Text Available Recent advances in computing technology have enabled microsecond long all-atom molecular dynamics (MD simulations of biological systems. Methods that can distill the salient features of such large trajectories are now urgently needed. Conventional clustering methods used to analyze MD trajectories suffer from various setbacks, namely (i they are not data driven, (ii they are unstable to noise and changes in cut-off parameters such as cluster radius and cluster number, and (iii they do not reduce the dimensionality of the trajectories, and hence are unsuitable for finding collective coordinates. We advocate the application of principal component analysis (PCA and a non-metric multidimensional scaling (nMDS method to reduce MD trajectories and overcome the drawbacks of clustering. To illustrate the superiority of nMDS over other methods in reducing data and reproducing salient features, we analyze three complete villin headpiece folding trajectories. Our analysis suggests that the folding process of the villin headpiece is structurally heterogeneous.

  13. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    Science.gov (United States)

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  14. The Extended Northern ROSAT Galaxy Cluster Survey (NORAS II). I. Survey Construction and First Results

    International Nuclear Information System (INIS)

    Böhringer, Hans; Chon, Gayoung; Trümper, Joachim; Retzlaff, Jörg; Meisenheimer, Klaus; Schartel, Norbert

    2017-01-01

    As the largest, clearly defined building blocks of our universe, galaxy clusters are interesting astrophysical laboratories and important probes for cosmology. X-ray surveys for galaxy clusters provide one of the best ways to characterize the population of galaxy clusters. We provide a description of the construction of the NORAS II galaxy cluster survey based on X-ray data from the northern part of the ROSAT All-Sky Survey. NORAS II extends the NORAS survey down to a flux limit of 1.8 × 10 −12 erg s −1 cm −2 (0.1–2.4 keV), increasing the sample size by about a factor of two. The NORAS II cluster survey now reaches the same quality and depth as its counterpart, the southern REFLEX II survey, allowing us to combine the two complementary surveys. The paper provides information on the determination of the cluster X-ray parameters, the identification process of the X-ray sources, the statistics of the survey, and the construction of the survey selection function, which we provide in numerical format. Currently NORAS II contains 860 clusters with a median redshift of z  = 0.102. We provide a number of statistical functions, including the log N –log S and the X-ray luminosity function and compare these to the results from the complementary REFLEX II survey. Using the NORAS II sample to constrain the cosmological parameters, σ 8 and Ω m , yields results perfectly consistent with those of REFLEX II. Overall, the results show that the two hemisphere samples, NORAS II and REFLEX II, can be combined without problems into an all-sky sample, just excluding the zone of avoidance.

  15. The Extended Northern ROSAT Galaxy Cluster Survey (NORAS II). I. Survey Construction and First Results

    Energy Technology Data Exchange (ETDEWEB)

    Böhringer, Hans; Chon, Gayoung; Trümper, Joachim [Max-Planck-Institut für Extraterrestrische Physik, D-85748 Garching (Germany); Retzlaff, Jörg [ESO, D-85748 Garching (Germany); Meisenheimer, Klaus [Max-Planck-Institut für Astronomy, Königstuhl 17, D-69117 Heidelberg (Germany); Schartel, Norbert [ESAC, Camino Bajo del Castillo, Villanueva de la Cañada, E-28692 Madrid (Spain)

    2017-05-01

    As the largest, clearly defined building blocks of our universe, galaxy clusters are interesting astrophysical laboratories and important probes for cosmology. X-ray surveys for galaxy clusters provide one of the best ways to characterize the population of galaxy clusters. We provide a description of the construction of the NORAS II galaxy cluster survey based on X-ray data from the northern part of the ROSAT All-Sky Survey. NORAS II extends the NORAS survey down to a flux limit of 1.8 × 10{sup −12} erg s{sup −1} cm{sup −2} (0.1–2.4 keV), increasing the sample size by about a factor of two. The NORAS II cluster survey now reaches the same quality and depth as its counterpart, the southern REFLEX II survey, allowing us to combine the two complementary surveys. The paper provides information on the determination of the cluster X-ray parameters, the identification process of the X-ray sources, the statistics of the survey, and the construction of the survey selection function, which we provide in numerical format. Currently NORAS II contains 860 clusters with a median redshift of z  = 0.102. We provide a number of statistical functions, including the log N –log S and the X-ray luminosity function and compare these to the results from the complementary REFLEX II survey. Using the NORAS II sample to constrain the cosmological parameters, σ {sub 8} and Ω{sub m}, yields results perfectly consistent with those of REFLEX II. Overall, the results show that the two hemisphere samples, NORAS II and REFLEX II, can be combined without problems into an all-sky sample, just excluding the zone of avoidance.

  16. Performance Analysis of a Cluster-Based MAC Protocol for Wireless Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Jesús Alonso-Zárate

    2010-01-01

    Full Text Available An analytical model to evaluate the non-saturated performance of the Distributed Queuing Medium Access Control Protocol for Ad Hoc Networks (DQMANs in single-hop networks is presented in this paper. DQMAN is comprised of a spontaneous, temporary, and dynamic clustering mechanism integrated with a near-optimum distributed queuing Medium Access Control (MAC protocol. Clustering is executed in a distributed manner using a mechanism inspired by the Distributed Coordination Function (DCF of the IEEE 802.11. Once a station seizes the channel, it becomes the temporary clusterhead of a spontaneous cluster and it coordinates the peer-to-peer communications between the clustermembers. Within each cluster, a near-optimum distributed queuing MAC protocol is executed. The theoretical performance analysis of DQMAN in single-hop networks under non-saturation conditions is presented in this paper. The approach integrates the analysis of the clustering mechanism into the MAC layer model. Up to the knowledge of the authors, this approach is novel in the literature. In addition, the performance of an ad hoc network using DQMAN is compared to that obtained when using the DCF of the IEEE 802.11, as a benchmark reference.

  17. Nature and Determinants of the Course of Chronic Low Back Pain Over a 12-Month Period: A Cluster Analysis

    Science.gov (United States)

    Maher, Christopher G.; Latimer, Jane; McAuley, James H.; Hodges, Paul W.; Rogers, W. Todd

    2014-01-01

    Background It has been suggested that low back pain (LBP) is a condition with an unpredictable pattern of exacerbation, remission, and recurrence. However, there is an incomplete understanding of the course of LBP and the determinants of the course. Objective The purposes of this study were: (1) to identify clusters of LBP patients with similar fluctuating pain patterns over time and (2) to investigate whether demographic and clinical characteristics can distinguish these clusters. Design This study was a secondary analysis of data extracted from a randomized controlled trial. Methods Pain scores were collected from 155 participants with chronic nonspecific LBP. Pain intensity was measured monthly over a 1-year period by mobile phone short message service. Cluster analysis was used to identify participants with similar fluctuating patterns of pain based on the pain measures collected over a year, and t tests were used to evaluate if the clusters differed in terms of baseline characteristics. Results The cluster analysis revealed the presence of 3 main clusters. Pain was of fluctuating nature within 2 of the clusters. Out of the 155 participants, 21 (13.5%) had fluctuating pain. Baseline disability (measured with the Roland-Morris Disability Questionnaire) and treatment groups (from the initial randomized controlled trial) were significantly different in the clusters of patients with fluctuating pain when compared with the cluster of patients without fluctuating pain. Limitations A limitation of this study was the fact that participants were undergoing treatment that may have been responsible for the rather positive prognosis observed. Conclusions A small number of patients with fluctuating patterns of pain over time were identified. This number could increase if individuals with episodic pain are included in this fluctuating group. PMID:24072729

  18. Preliminary Cluster Size and Efficiencies results of CMS RPC at GIF++

    CERN Document Server

    Gonzalez Blanco Gonzalez, Genoveva

    2016-01-01

    A brief description and first preliminary results of the Efficiencies and Cluster Size measurements of the CMS Resistive Plate Chambers, will be presented inside the Gamma Irradiation Facility GIF++ at CERN. Preliminary studies that sets the base performance measurements of CMS RPC for starting aging studies.

  19. Differences Between Ward's and UPGMA Methods of Cluster Analysis: Implications for School Psychology.

    Science.gov (United States)

    Hale, Robert L.; Dougherty, Donna

    1988-01-01

    Compared the efficacy of two methods of cluster analysis, the unweighted pair-groups method using arithmetic averages (UPGMA) and Ward's method, for students grouped on intelligence, achievement, and social adjustment by both clustering methods. Found UPGMA more efficacious based on output, on cophenetic correlation coefficients generated by each…

  20. Identifying At-Risk Students in General Chemistry via Cluster Analysis of Affective Characteristics

    Science.gov (United States)

    Chan, Julia Y. K.; Bauer, Christopher F.

    2014-01-01

    The purpose of this study is to identify academically at-risk students in first-semester general chemistry using affective characteristics via cluster analysis. Through the clustering of six preselected affective variables, three distinct affective groups were identified: low (at-risk), medium, and high. Students in the low affective group…

  1. Analysis of risk factors for cluster behavior of dental implant failures.

    Science.gov (United States)

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  2. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    Science.gov (United States)

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  4. Improving clustering with metabolic pathway data.

    Science.gov (United States)

    Milone, Diego H; Stegmayer, Georgina; López, Mariana; Kamenetzky, Laura; Carrari, Fernando

    2014-04-10

    It is a common practice in bioinformatics to validate each group returned by a clustering algorithm through manual analysis, according to a-priori biological knowledge. This procedure helps finding functionally related patterns to propose hypotheses for their behavior and the biological processes involved. Therefore, this knowledge is used only as a second step, after data are just clustered according to their expression patterns. Thus, it could be very useful to be able to improve the clustering of biological data by incorporating prior knowledge into the cluster formation itself, in order to enhance the biological value of the clusters. A novel training algorithm for clustering is presented, which evaluates the biological internal connections of the data points while the clusters are being formed. Within this training algorithm, the calculation of distances among data points and neurons centroids includes a new term based on information from well-known metabolic pathways. The standard self-organizing map (SOM) training versus the biologically-inspired SOM (bSOM) training were tested with two real data sets of transcripts and metabolites from Solanum lycopersicum and Arabidopsis thaliana species. Classical data mining validation measures were used to evaluate the clustering solutions obtained by both algorithms. Moreover, a new measure that takes into account the biological connectivity of the clusters was applied. The results of bSOM show important improvements in the convergence and performance for the proposed clustering method in comparison to standard SOM training, in particular, from the application point of view. Analyses of the clusters obtained with bSOM indicate that including biological information during training can certainly increase the biological value of the clusters found with the proposed method. It is worth to highlight that this fact has effectively improved the results, which can simplify their further analysis.The algorithm is available as a

  5. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  6. Cluster analysis as a tool of guests segmentation by the degree of their demand

    Directory of Open Access Journals (Sweden)

    Damijan Mumel

    2002-01-01

    Full Text Available Authors demonstrate the use of cluster analysis in finding out (ascertaining the homogenity/heterogenity of guests as to the degree of their demand. The degree of guests’ demand is defined according to the importance of perceived service quality components measured by SERVQUAL, which was adopted and adapted, according to the specifics of health spa industry in Slovenia. Goals of the article are: (a the identification of the profile of importance of general health spa service quality components, and (b the identification of groups of guests (segments according to the degree of their demand in the research in 1991 compared with 1999. Cluster analysis serves as useful tool for guest segmentation since it reveals the existence of important differences in the structure of guests in the year 1991 compared with the year 1999. The results serve as a useful database for management in health spas.

  7. Estimating multi-phase pore-scale characteristics from X-ray tomographic data using cluster analysis-based segmentation

    DEFF Research Database (Denmark)

    Wildenschild, D.; Culligan, K.A.; Christensen, Britt Stenhøj Baun

    2006-01-01

    present in grey-scale X-ray tomographic images. The approach is based on a cluster analysis technique, used in combination with various other filtering and skeletonization schemes. We apply this segmentation algorithm to analyze multiphase pore-scale flow subjects such as hysteresis and interfacial...... characterization. The results clearly illustrate the advantage of using X-ray tomography together with cluster analysis-based image processing techniques. We were able to obtain detailed information on pore scale distribution of air and water phases, as well as quantitative measures of air bubble size and air...... of individual pores and interfaces. However, separation of the various phases (fluids and solids) in the grey-scale tomographic images has posed a major problem to quantitative analysis of the data. We present an image processing technique that facilitates identification and separation of the various phases...

  8. Automatic Clustering Using FSDE-Forced Strategy Differential Evolution

    Science.gov (United States)

    Yasid, A.

    2018-01-01

    Clustering analysis is important in datamining for unsupervised data, cause no adequate prior knowledge. One of the important tasks is defining the number of clusters without user involvement that is known as automatic clustering. This study intends on acquiring cluster number automatically utilizing forced strategy differential evolution (AC-FSDE). Two mutation parameters, namely: constant parameter and variable parameter are employed to boost differential evolution performance. Four well-known benchmark datasets were used to evaluate the algorithm. Moreover, the result is compared with other state of the art automatic clustering methods. The experiment results evidence that AC-FSDE is better or competitive with other existing automatic clustering algorithm.

  9. CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES

    Directory of Open Access Journals (Sweden)

    J. Shen

    2015-07-01

    Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  10. Cluster cosmological analysis with X ray instrumental observables: introduction and testing of AsPIX method

    International Nuclear Information System (INIS)

    Valotti, Andrea

    2016-01-01

    Cosmology is one of the fundamental pillars of astrophysics, as such it contains many unsolved puzzles. To investigate some of those puzzles, we analyze X-ray surveys of galaxy clusters. These surveys are possible thanks to the bremsstrahlung emission of the intra-cluster medium. The simultaneous fit of cluster counts as a function of mass and distance provides an independent measure of cosmological parameters such as Ω m , σ s , and the dark energy equation of state w0. A novel approach to cosmological analysis using galaxy cluster data, called top-down, was developed in N. Clerc et al. (2012). This top-down approach is based purely on instrumental observables that are considered in a two-dimensional X-ray color-magnitude diagram. The method self-consistently includes selection effects and scaling relationships. It also provides a means of bypassing the computation of individual cluster masses. My work presents an extension of the top-down method by introducing the apparent size of the cluster, creating a three-dimensional X-ray cluster diagram. The size of a cluster is sensitive to both the cluster mass and its angular diameter, so it must also be included in the assessment of selection effects. The performance of this new method is investigated using a Fisher analysis. In parallel, I have studied the effects of the intrinsic scatter in the cluster size scaling relation on the sample selection as well as on the obtained cosmological parameters. To validate the method, I estimate uncertainties of cosmological parameters with MCMC method Amoeba minimization routine and using two simulated XMM surveys that have an increasing level of complexity. The first simulated survey is a set of toy catalogues of 100 and 10000 deg 2 , whereas the second is a 1000 deg 2 catalogue that was generated using an Aardvark semi-analytical N-body simulation. This comparison corroborates the conclusions of the Fisher analysis. In conclusion, I find that a cluster diagram that accounts

  11. HORIZONTAL BRANCH MORPHOLOGY OF GLOBULAR CLUSTERS: A MULTIVARIATE STATISTICAL ANALYSIS

    International Nuclear Information System (INIS)

    Jogesh Babu, G.; Chattopadhyay, Tanuka; Chattopadhyay, Asis Kumar; Mondal, Saptarshi

    2009-01-01

    The proper interpretation of horizontal branch (HB) morphology is crucial to the understanding of the formation history of stellar populations. In the present study a multivariate analysis is used (principal component analysis) for the selection of appropriate HB morphology parameter, which, in our case, is the logarithm of effective temperature extent of the HB (log T effHB ). Then this parameter is expressed in terms of the most significant observed independent parameters of Galactic globular clusters (GGCs) separately for coherent groups, obtained in a previous work, through a stepwise multiple regression technique. It is found that, metallicity ([Fe/H]), central surface brightness (μ v ), and core radius (r c ) are the significant parameters to explain most of the variations in HB morphology (multiple R 2 ∼ 0.86) for GGC elonging to the bulge/disk while metallicity ([Fe/H]) and absolute magnitude (M v ) are responsible for GGC belonging to the inner halo (multiple R 2 ∼ 0.52). The robustness is tested by taking 1000 bootstrap samples. A cluster analysis is performed for the red giant branch (RGB) stars of the GGC belonging to Galactic inner halo (Cluster 2). A multi-episodic star formation is preferred for RGB stars of GGC belonging to this group. It supports the asymptotic giant branch (AGB) model in three episodes instead of two as suggested by Carretta et al. for halo GGC while AGB model is suggested to be revisited for bulge/disk GGC.

  12. Sirenomelia in Argentina: Prevalence, geographic clusters and temporal trends analysis.

    Science.gov (United States)

    Groisman, Boris; Liascovich, Rosa; Gili, Juan Antonio; Barbero, Pablo; Bidondo, María Paz

    2016-07-01

    Sirenomelia is a severe malformation of the lower body characterized by a single medial lower limb and a variable combination of visceral abnormalities. Given that Sirenomelia is a very rare birth defect, epidemiological studies are scarce. The aim of this study is to evaluate prevalence, geographic clusters and time trends of sirenomelia in Argentina, using data from the National Network of Congenital Anomalies of Argentina (RENAC) from November 2009 until December 2014. This is a descriptive study using data from the RENAC, a hospital-based surveillance system for newborns affected with major morphological congenital anomalies. We calculated sirenomelia prevalence throughout the period, searched for geographical clusters, and evaluated time trends. The prevalence of confirmed cases of sirenomelia throughout the period was 2.35 per 100,000 births. Cluster analysis showed no statistically significant geographical aggregates. Time-trends analysis showed that the prevalence was higher in years 2009 to 2010. The observed prevalence was higher than the observed in previous epidemiological studies in other geographic regions. We observed a likely real increase in the initial period of our study. We used strict diagnostic criteria, excluding cases that only had clinical diagnosis of sirenomelia. Therefore, real prevalence could be even higher. This study did not show any geographic clusters. Because etiology of sirenomelia has not yet been established, studies of epidemiological features of this defect may contribute to define its causes. Birth Defects Research (Part A) 106:604-611, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  13. Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

    Science.gov (United States)

    Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

    2018-03-01

    Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.

  14. Cluster Oriented Spatio Temporal Multidimensional Data Visualization of Earthquakes in Indonesia

    Directory of Open Access Journals (Sweden)

    Mohammad Nur Shodiq

    2016-03-01

    Full Text Available Spatio temporal data clustering is challenge task. The result of clustering data are utilized to investigate the seismic parameters. Seismic parameters are used to describe the characteristics of earthquake behavior. One of the effective technique to study multidimensional spatio temporal data is visualization. But, visualization of multidimensional data is complicated problem. Because, this analysis consists of observed data cluster and seismic parameters. In this paper, we propose a visualization system, called as IES (Indonesia Earthquake System, for cluster analysis, spatio temporal analysis, and visualize the multidimensional data of seismic parameters. We analyze the cluster analysis by using automatic clustering, that consists of get optimal number of cluster and Hierarchical K-means clustering. We explore the visual cluster and multidimensional data in low dimensional space visualization. We made experiment with observed data, that consists of seismic data around Indonesian archipelago during 2004 to 2014. Keywords: Clustering, visualization, multidimensional data, seismic parameters.

  15. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.

    2017-10-06

    Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom\\'s predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and Expectation-Maximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom\\'s CC subsets.

  16. Galaxy Clusters in the Swift/BAT era II: 10 more Clusters detected above 15 keV

    Energy Technology Data Exchange (ETDEWEB)

    Ajello, M.; /SLAC /KIPAC, Menlo Park; Rebusco, P.; /KIPAC, Menlo Park; Cappelluti, N.; /Garching, Max Planck Inst., MPE /Maryland U., Baltimore County; Reimer, O.; /SLAC /Palermo Observ.; Boehringer, H.; /Garching, Max Planck Inst., MPE; La Parola, V.; Cusumano, G.; /Palermo Observ.

    2010-10-27

    We report on the discovery of 10 additional galaxy clusters detected in the ongoing Swift/BAT all-sky survey. Among the newly BAT-discovered clusters there are: Bullet, Abell 85, Norma, and PKS 0745-19. Norma is the only cluster, among those presented here, which is resolved by BAT. For all the clusters we perform a detailed spectral analysis using XMM-Newton and Swift/BAT data to investigate the presence of a hard (non-thermal) X-ray excess. We find that in most cases the clusters emission in the 0.3-200 keV band can be explained by a multi-temperature thermal model confirming our previous results. For two clusters (Bullet and Abell 3667) we find evidence for the presence of a hard X-ray excess. In the case of the Bullet cluster, our analysis confirms the presence of a non-thermal, power-law like, component with a 20-100 keV flux of 3.4 x 10{sup -12} erg cm{sup -2} s{sup -1} as detected in previous studies. For Abell 3667 the excess emission can be successfully modeled as a hot component (kT = {approx}13 keV). We thus conclude that the hard X-ray emission from galaxy clusters (except the Bullet) has most likely thermal origin.

  17. GALAXY CLUSTERS IN THE SWIFT/BAT ERA. II. 10 MORE CLUSTERS DETECTED ABOVE 15 keV

    International Nuclear Information System (INIS)

    Ajello, M.; Reimer, O.; Rebusco, P.; Cappelluti, N.; Boehringer, H.; La Parola, V.; Cusumano, G.

    2010-01-01

    We report on the discovery of 10 additional galaxy clusters detected in the ongoing Swift/Burst Alert Telescope (BAT) all-sky survey. Among the newly BAT-discovered clusters there are Bullet, A85, Norma, and PKS 0745-19. Norma is the only cluster, among those presented here, which is resolved by BAT. For all the clusters, we perform a detailed spectral analysis using XMM-Newton and Swift/BAT data to investigate the presence of a hard (non-thermal) X-ray excess. We find that in most cases the clusters' emission in the 0.3-200 keV band can be explained by a multi-temperature thermal model confirming our previous results. For two clusters (Bullet and A3667), we find evidence for the presence of a hard X-ray excess. In the case of the Bullet cluster, our analysis confirms the presence of a non-thermal, power-law-like, component with a 20-100 keV flux of 3.4 x 10 -12 erg cm -2 s -1 as detected in previous studies. For A3667, the excess emission can be successfully modeled as a hot component (kT ∼ 13 keV). We thus conclude that the hard X-ray emission from galaxy clusters (except the Bullet) has most likely a thermal origin.

  18. A weak lensing analysis of the PLCK G100.2-30.4 cluster

    Science.gov (United States)

    Radovich, M.; Formicola, I.; Meneghetti, M.; Bartalucci, I.; Bourdin, H.; Mazzotta, P.; Moscardini, L.; Ettori, S.; Arnaud, M.; Pratt, G. W.; Aghanim, N.; Dahle, H.; Douspis, M.; Pointecouteau, E.; Grado, A.

    2015-07-01

    We present a mass estimate of the Planck-discovered cluster PLCK G100.2-30.4, derived from a weak lensing analysis of deep Subaru griz images. We perform a careful selection of the background galaxies using the multi-band imaging data, and undertake the weak lensing analysis on the deep (1 h) r -band image. The shape measurement is based on the Kaiser-Squires-Broadhurst algorithm; we adopt the PSFex software to model the point spread function (PSF) across the field and correct for this in the shape measurement. The weak lensing analysis is validated through extensive image simulations. We compare the resulting weak lensing mass profile and total mass estimate to those obtained from our re-analysis of XMM-Newton observations, derived under the hypothesis of hydrostatic equilibrium. The total integrated mass profiles agree remarkably well, within 1σ across their common radial range. A mass M500 ~ 7 × 1014M⊙ is derived for the cluster from our weak lensing analysis. Comparing this value to that obtained from our reanalysis of XMM-Newton data, we obtain a bias factor of (1-b) = 0.8 ± 0.1. This is compatible within 1σ with the value of (1-b) obtained in Planck 2015 from the calibration of the bias factor using newly available weak lensing reconstructed masses. Based on data collected at Subaru Telescope (University of Tokyo).

  19. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  20. Integrating PROOF Analysis in Cloud and Batch Clusters

    International Nuclear Information System (INIS)

    Rodríguez-Marrero, Ana Y; Fernández-del-Castillo, Enol; López García, Álvaro; Marco de Lucas, Jesús; Matorras Weinig, Francisco; González Caballero, Isidro; Cuesta Noriega, Alberto

    2012-01-01

    High Energy Physics (HEP) analysis are becoming more complex and demanding due to the large amount of data collected by the current experiments. The Parallel ROOT Facility (PROOF) provides researchers with an interactive tool to speed up the analysis of huge volumes of data by exploiting parallel processing on both multicore machines and computing clusters. The typical PROOF deployment scenario is a permanent set of cores configured to run the PROOF daemons. However, this approach is incapable of adapting to the dynamic nature of interactive usage. Several initiatives seek to improve the use of computing resources by integrating PROOF with a batch system, such as Proof on Demand (PoD) or PROOF Cluster. These solutions are currently in production at Universidad de Oviedo and IFCA and are positively evaluated by users. Although they are able to adapt to the computing needs of users, they must comply with the specific configuration, OS and software installed at the batch nodes. Furthermore, they share the machines with other workloads, which may cause disruptions in the interactive service for users. These limitations make PROOF a typical use-case for cloud computing. In this work we take profit from Cloud Infrastructure at IFCA in order to provide a dynamic PROOF environment where users can control the software configuration of the machines. The Proof Analysis Framework (PAF) facilitates the development of new analysis and offers a transparent access to PROOF resources. Several performance measurements are presented for the different scenarios (PoD, SGE and Cloud), showing a speed improvement closely correlated with the number of cores used.

  1. Comparing 3 dietary pattern methods--cluster analysis, factor analysis, and index analysis--With colorectal cancer risk: The NIH-AARP Diet and Health Study.

    Science.gov (United States)

    Reedy, Jill; Wirfält, Elisabet; Flood, Andrew; Mitrou, Panagiota N; Krebs-Smith, Susan M; Kipnis, Victor; Midthune, Douglas; Leitzmann, Michael; Hollenbeck, Albert; Schatzkin, Arthur; Subar, Amy F

    2010-02-15

    The authors compared dietary pattern methods-cluster analysis, factor analysis, and index analysis-with colorectal cancer risk in the National Institutes of Health (NIH)-AARP Diet and Health Study (n = 492,306). Data from a 124-item food frequency questionnaire (1995-1996) were used to identify 4 clusters for men (3 clusters for women), 3 factors, and 4 indexes. Comparisons were made with adjusted relative risks and 95% confidence intervals, distributions of individuals in clusters by quintile of factor and index scores, and health behavior characteristics. During 5 years of follow-up through 2000, 3,110 colorectal cancer cases were ascertained. In men, the vegetables and fruits cluster, the fruits and vegetables factor, the fat-reduced/diet foods factor, and all indexes were associated with reduced risk; the meat and potatoes factor was associated with increased risk. In women, reduced risk was found with the Healthy Eating Index-2005 and increased risk with the meat and potatoes factor. For men, beneficial health characteristics were seen with all fruit/vegetable patterns, diet foods patterns, and indexes, while poorer health characteristics were found with meat patterns. For women, findings were similar except that poorer health characteristics were seen with diet foods patterns. Similarities were found across methods, suggesting basic qualities of healthy diets. Nonetheless, findings vary because each method answers a different question.

  2. OMERACT-based fibromyalgia symptom subgroups: an exploratory cluster analysis.

    Science.gov (United States)

    Vincent, Ann; Hoskin, Tanya L; Whipple, Mary O; Clauw, Daniel J; Barton, Debra L; Benzo, Roberto P; Williams, David A

    2014-10-16

    The aim of this study was to identify subsets of patients with fibromyalgia with similar symptom profiles using the Outcome Measures in Rheumatology (OMERACT) core symptom domains. Female patients with a diagnosis of fibromyalgia and currently meeting fibromyalgia research survey criteria completed the Brief Pain Inventory, the 30-item Profile of Mood States, the Medical Outcomes Sleep Scale, the Multidimensional Fatigue Inventory, the Multiple Ability Self-Report Questionnaire, the Fibromyalgia Impact Questionnaire-Revised (FIQ-R) and the Short Form-36 between 1 June 2011 and 31 October 2011. Hierarchical agglomerative clustering was used to identify subgroups of patients with similar symptom profiles. To validate the results from this sample, hierarchical agglomerative clustering was repeated in an external sample of female patients with fibromyalgia with similar inclusion criteria. A total of 581 females with a mean age of 55.1 (range, 20.1 to 90.2) years were included. A four-cluster solution best fit the data, and each clustering variable differed significantly (P FIQ-R total scores (P = 0.0004)). In our study, we incorporated core OMERACT symptom domains, which allowed for clustering based on a comprehensive symptom profile. Although our exploratory cluster solution needs confirmation in a longitudinal study, this approach could provide a rationale to support the study of individualized clinical evaluation and intervention.

  3. Identification and functional analysis of gene cluster involvement in biosynthesis of the cyclic lipopeptide antibiotic pelgipeptin produced by Paenibacillus elgii

    Directory of Open Access Journals (Sweden)

    Qian Chao-Dong

    2012-09-01

    Full Text Available Abstract Background Pelgipeptin, a potent antibacterial and antifungal agent, is a non-ribosomally synthesised lipopeptide antibiotic. This compound consists of a β-hydroxy fatty acid and nine amino acids. To date, there is no information about its biosynthetic pathway. Results A potential pelgipeptin synthetase gene cluster (plp was identified from Paenibacillus elgii B69 through genome analysis. The gene cluster spans 40.8 kb with eight open reading frames. Among the genes in this cluster, three large genes, plpD, plpE, and plpF, were shown to encode non-ribosomal peptide synthetases (NRPSs, with one, seven, and one module(s, respectively. Bioinformatic analysis of the substrate specificity of all nine adenylation domains indicated that the sequence of the NRPS modules is well collinear with the order of amino acids in pelgipeptin. Additional biochemical analysis of four recombinant adenylation domains (PlpD A1, PlpE A1, PlpE A3, and PlpF A1 provided further evidence that the plp gene cluster involved in pelgipeptin biosynthesis. Conclusions In this study, a gene cluster (plp responsible for the biosynthesis of pelgipeptin was identified from the genome sequence of Paenibacillus elgii B69. The identification of the plp gene cluster provides an opportunity to develop novel lipopeptide antibiotics by genetic engineering.

  4. HICOSMO - cosmology with a complete sample of galaxy clusters - I. Data analysis, sample selection and luminosity-mass scaling relation

    Science.gov (United States)

    Schellenberger, G.; Reiprich, T. H.

    2017-08-01

    The X-ray regime, where the most massive visible component of galaxy clusters, the intracluster medium, is visible, offers directly measured quantities, like the luminosity, and derived quantities, like the total mass, to characterize these objects. The aim of this project is to analyse a complete sample of galaxy clusters in detail and constrain cosmological parameters, like the matter density, Ωm, or the amplitude of initial density fluctuations, σ8. The purely X-ray flux-limited sample (HIFLUGCS) consists of the 64 X-ray brightest galaxy clusters, which are excellent targets to study the systematic effects, that can bias results. We analysed in total 196 Chandra observations of the 64 HIFLUGCS clusters, with a total exposure time of 7.7 Ms. Here, we present our data analysis procedure (including an automated substructure detection and an energy band optimization for surface brightness profile analysis) that gives individually determined, robust total mass estimates. These masses are tested against dynamical and Planck Sunyaev-Zeldovich (SZ) derived masses of the same clusters, where good overall agreement is found with the dynamical masses. The Planck SZ masses seem to show a mass-dependent bias to our hydrostatic masses; possible biases in this mass-mass comparison are discussed including the Planck selection function. Furthermore, we show the results for the (0.1-2.4) keV luminosity versus mass scaling relation. The overall slope of the sample (1.34) is in agreement with expectations and values from literature. Splitting the sample into galaxy groups and clusters reveals, even after a selection bias correction, that galaxy groups exhibit a significantly steeper slope (1.88) compared to clusters (1.06).

  5. Distribution of shallow very low frequency earthquakes in the eastern Nankai trough influenced by a subducted oceanic ridge: Results from cluster analysis applied to ocean bottom seismographs

    Science.gov (United States)

    To, A.; Obana, K.; Araki, E.

    2016-12-01

    The activity of very low frequency earthquakes (VLFEs) in the shallow accretionary prism of the eastern Nankai trough has been observed frequently in the past. In this study, we investigated the distribution of VLFEs that occurred in October 2015, which were recorded by an array of broadband ocean bottom seismometers (BBOBSs) of DONET1 network. The size of the network is much wider (>80 km) compared to previous BBOBS networks that were used for close-in observations of VLFEs; therefore the new dataset provides a broader overview of the VLFE distribution of this region. We first located the detected events using conventional methods such as the envelope correlation method. However, the results seemed to be largely scattered due to noise and the effect of 3D structures that could not be properly handled. Then, we introduced hierarchal clustering analysis, based on measured travel time patterns among stations obtained for each event. The analyses enabled the assessment of relative locations among events. Finally, the locations of event-clusters were estimated, instead of individual events, so that the obtained locations seemed less scattered. The obtained results indicate that the VLFE distribution is strongly influenced by a subducted ridge (Park et al., 2003) that exists beneath the northeastern side of the DONET1 network. Though the VLFEs are distributed from an area near the outer ridge toward the trench axis in the region with a smooth plate boundary, they are clustered at a shallow depth near the outer ridge in the region of the rough plate boundary. The VLFEs are clustered on the landward side of the peak of the subducted ridge; this could be explained by an elevated pore pressure in the region caused by the low-permeability oceanic ridge that may clog the up-dip pathway of the fluid along the decollement zone. The along-strike variation of the stress state, inferred from the VLFE distribution, should be an important factor in assessing the strain release

  6. Grouping of Cities In Terms Of Primary Health Indicators in Turkey: An Application of Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Bilgehan TEKİN

    2015-12-01

    Full Text Available It is thought that to determine the differences between cities that locate in Turkey is important in the context of primary health care indicators. The subject of this study is the classification of cities in Turkey in terms of health indicators. The cluster analysis method which is the one of the data mining and multivariate statistical methods is used for classification method. The main objective of the study is to examine the point of results of movement transformation in health in terms of basic health indicators on the basis of cities.. In this context, 81 cities, in Turkey are grouped with sixteen health indicators which is assumed to demonstrate the effectiveness of health care services, by the years of 2013. And also compared with the health and socio-economic development ranking in the previous studies. Providences are gathered in 21, 13, 11, 7 and 5 clusters. 11’s, 7’s and 5’s clusters are determined as the most significant clusters. As a result of the study the development gap between eastern and western provinces emerges in terms of the health variables.

  7. Clustering performance comparison using K-means and expectation maximization algorithms.

    Science.gov (United States)

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  8. Clustering stocks using partial correlation coefficients

    Science.gov (United States)

    Jung, Sean S.; Chang, Woojin

    2016-11-01

    A partial correlation analysis is performed on the Korean stock market (KOSPI). The difference between Pearson correlation and the partial correlation is analyzed and it is found that when conditioned on the market return, Pearson correlation coefficients are generally greater than those of the partial correlation, which implies that the market return tends to drive up the correlation between stock returns. A clustering analysis is then performed to study the market structure given by the partial correlation analysis and the members of the clusters are compared with the Global Industry Classification Standard (GICS). The initial hypothesis is that the firms in the same GICS sector are clustered together since they are in a similar business and environment. However, the result is inconsistent with the hypothesis and most clusters are a mix of multiple sectors suggesting that the traditional approach of using sectors to determine the proximity between stocks may not be sufficient enough to diversify a portfolio.

  9. Major cluster mergers and the location of the brightest cluster galaxy

    International Nuclear Information System (INIS)

    Martel, Hugo; Robichaud, Fidèle; Barai, Paramita

    2014-01-01

    Using a large N-body cosmological simulation combined with a subgrid treatment of galaxy formation, merging, and tidal destruction, we study the formation and evolution of the galaxy and cluster population in a comoving volume (100 Mpc) 3 in a ΛCDM universe. At z = 0, our computational volume contains 1788 clusters with mass M cl > 1.1 × 10 12 M ☉ , including 18 massive clusters with M cl > 10 14 M ☉ . It also contains 1, 088, 797 galaxies with mass M gal ≥ 2 × 10 9 M ☉ and luminosity L > 9.5 × 10 5 L ☉ . For each cluster, we identified the brightest cluster galaxy (BCG). We then computed two separate statistics: the fraction f BNC of clusters in which the BCG is not the closest galaxy to the center of the cluster in projection, and the ratio Δv/σ, where Δv is the difference in radial velocity between the BCG and the whole cluster and σ is the radial velocity dispersion of the cluster. We found that f BNC increases from 0.05 for low-mass clusters (M cl ∼ 10 12 M ☉ ) to 0.5 for high-mass clusters (M cl > 10 14 M ☉ ) with very little dependence on cluster redshift. Most of this result turns out to be a projection effect and when we consider three-dimensional distances instead of projected distances, f BNC increases only to 0.2 at high-cluster mass. The values of Δv/σ vary from 0 to 1.8, with median values in the range 0.03-0.15 when considering all clusters, and 0.12-0.31 when considering only massive clusters. These results are consistent with previous observational studies and indicate that the central galaxy paradigm, which states that the BCG should be at rest at the center of the cluster, is usually valid, but exceptions are too common to be ignored. We built merger trees for the 18 most massive clusters in the simulation. Analysis of these trees reveal that 16 of these clusters have experienced 1 or several major or semi-major mergers in the past. These mergers leave each cluster in a non-equilibrium state, but eventually the cluster

  10. Cluster analysis and quality assessment of logged water at an irrigation project, eastern Saudi Arabia.

    Science.gov (United States)

    Hussain, Mahbub; Ahmed, Syed Munaf; Abderrahman, Walid

    2008-01-01

    A multivariate statistical technique, cluster analysis, was used to assess the logged surface water quality at an irrigation project at Al-Fadhley, Eastern Province, Saudi Arabia. The principal idea behind using the technique was to utilize all available hydrochemical variables in the quality assessment including trace elements and other ions which are not considered in conventional techniques for water quality assessments like Stiff and Piper diagrams. Furthermore, the area belongs to an irrigation project where water contamination associated with the use of fertilizers, insecticides and pesticides is expected. This quality assessment study was carried out on a total of 34 surface/logged water samples. To gain a greater insight in terms of the seasonal variation of water quality, 17 samples were collected from both summer and winter seasons. The collected samples were analyzed for a total of 23 water quality parameters including pH, TDS, conductivity, alkalinity, sulfate, chloride, bicarbonate, nitrate, phosphate, bromide, fluoride, calcium, magnesium, sodium, potassium, arsenic, boron, copper, cobalt, iron, lithium, manganese, molybdenum, nickel, selenium, mercury and zinc. Cluster analysis in both Q and R modes was used. Q-mode analysis resulted in three distinct water types for both the summer and winter seasons. Q-mode analysis also showed the spatial as well as temporal variation in water quality. R-mode cluster analysis led to the conclusion that there are two major sources of contamination for the surface/shallow groundwater in the area: fertilizers, micronutrients, pesticides, and insecticides used in agricultural activities, and non-point natural sources.

  11. K-means cluster analysis of tourist destination in special region of Yogyakarta using spatial approach and social network analysis (a case study: post of @explorejogja instagram account in 2016)

    Science.gov (United States)

    Iswandhani, N.; Muhajir, M.

    2018-03-01

    This research was conducted in Department of Statistics Islamic University of Indonesia. The data used are primary data obtained by post @explorejogja instagram account from January until December 2016. In the @explorejogja instagram account found many tourist destinations that can be visited by tourists both in the country and abroad, Therefore it is necessary to form a cluster of existing tourist destinations based on the number of likes from user instagram assumed as the most popular. The purpose of this research is to know the most popular distribution of tourist spot, the cluster formation of tourist destinations, and central popularity of tourist destinations based on @explorejogja instagram account in 2016. Statistical analysis used is descriptive statistics, k-means clustering, and social network analysis. The results of this research were obtained the top 10 most popular destinations in Yogyakarta, map of html-based tourist destination distribution consisting of 121 tourist destination points, formed 3 clusters each consisting of cluster 1 with 52 destinations, cluster 2 with 9 destinations and cluster 3 with 60 destinations, and Central popularity of tourist destinations in the special region of Yogyakarta by district.

  12. Patterns of comorbidity in community-dwelling older people hospitalised for fall-related injury: A cluster analysis

    Directory of Open Access Journals (Sweden)

    Finch Caroline F

    2011-08-01

    Full Text Available Abstract Background Community-dwelling older people aged 65+ years sustain falls frequently; these can result in physical injuries necessitating medical attention including emergency department care and hospitalisation. Certain health conditions and impairments have been shown to contribute independently to the risk of falling or experiencing a fall injury, suggesting that individuals with these conditions or impairments should be the focus of falls prevention. Since older people commonly have multiple conditions/impairments, knowledge about which conditions/impairments coexist in at-risk individuals would be valuable in the implementation of a targeted prevention approach. The objective of this study was therefore to examine the prevalence and patterns of comorbidity in this population group. Methods We analysed hospitalisation data from Victoria, Australia's second most populous state, to estimate the prevalence of comorbidity in patients hospitalised at least once between 2005-6 and 2007-8 for treatment of acute fall-related injuries. In patients with two or more comorbid conditions (multicomorbidity we used an agglomerative hierarchical clustering method to cluster comorbidity variables and identify constellations of conditions. Results More than one in four patients had at least one comorbid condition and among patients with comorbidity one in three had multicomorbidity (range 2-7. The prevalence of comorbidity varied by gender, age group, ethnicity and injury type; it was also associated with a significant increase in the average cumulative length of stay per patient. The cluster analysis identified five distinct, biologically plausible clusters of comorbidity: cardiopulmonary/metabolic, neurological, sensory, stroke and cancer. The cardiopulmonary/metabolic cluster was the largest cluster among the clusters identified. Conclusions The consequences of comorbidity clustering in terms of falls and/or injury outcomes of hospitalised patients

  13. GLOBULAR CLUSTER POPULATIONS: FIRST RESULTS FROM S{sup 4}G EARLY-TYPE GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Zaritsky, Dennis [Steward Observatory, University of Arizona, 933 North Cherry Avenue, Tucson, AZ 85721 (United States); Aravena, Manuel [Núcleo de Astronomía, Facultad de Ingeniería, Universidad Diego Portales, Avenida Ejército 441, Santiago (Chile); Athanassoula, E.; Bosma, Albert [Aix Marseille Université, CNRS, LAM (Laboratoire d' Astrophysique de Marseille) UMR 7326, F-13388 Marseille (France); Comerón, Sébastien; Laine, Jarkko; Laurikainen, Eija; Salo, Heikki [Astronomy Division, Department of Physics, P.O. Box 3000, FI-90014 University of Oulu (Finland); Elmegreen, Bruce G. [IBM T. J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598 (United States); Erroz-Ferrer, Santiago; Knapen, Johan H. [Instituto de Astrofísica de Canarias, Vía Lácteas, E-38205 La Laguna (Spain); Gadotti, Dimitri A.; Muñoz-Mateos, Juan Carlos [European Southern Observatory, Casilla 19001, Santiago 19 (Chile); Hinz, Joannah L. [MMT Observatory, P.O. Box 210065, Tucson, AZ 85721 (United States); Ho, Luis C. [Kavli Institute for Astronomy and Astrophysics, Peking University, Beijing 100871 (China); Holwerda, Benne [Leiden Observatory, University of Leiden, Niels Bohrweg 4, NL-2333 Leiden (Netherlands); Sheth, Kartik, E-mail: dennis.zaritsky@gmail.com [National Radio Astronomy Observatory/NAASC, 520 Edgemont Road, Charlottesville, VA 22903 (United States)

    2015-02-01

    Using 3.6 μm images of 97 early-type galaxies, we develop and verify methodology to measure globular cluster populations from the S{sup 4}G survey images. We find that (1) the ratio, T {sub N}, of the number of clusters, N {sub CL}, to parent galaxy stellar mass, M {sub *}, rises weakly with M {sub *} for early-type galaxies with M {sub *} > 10{sup 10} M {sub ☉} when we calculate galaxy masses using a universal stellar initial mass function (IMF) but that the dependence of T {sub N} on M {sub *} is removed entirely once we correct for the recently uncovered systematic variation of IMF with M {sub *}; and (2) for M {sub *} < 10{sup 10} M {sub ☉}, there is no trend between N {sub CL} and M {sub *}, the scatter in T {sub N} is significantly larger (approaching two orders of magnitude), and there is evidence to support a previous, independent suggestion of two families of galaxies. The behavior of N {sub CL} in the lower-mass systems is more difficult to measure because these systems are inherently cluster-poor, but our results may add to previous evidence that large variations in cluster formation and destruction efficiencies are to be found among low-mass galaxies. The average fraction of stellar mass in clusters is ∼0.0014 for M {sub *} > 10{sup 10} M {sub ☉} and can be as large as ∼0.02 for less massive galaxies. These are the first results from the S{sup 4}G sample of galaxies and will be enhanced by the sample of early-type galaxies now being added to S{sup 4}G and complemented by the study of later-type galaxies within S{sup 4}G.

  14. piRNA analysis framework from small RNA-Seq data by a novel cluster prediction tool - PILFER.

    Science.gov (United States)

    Ray, Rishav; Pandey, Priyanka

    2017-12-19

    With the increasing number of studies focusing on PIWI-interacting RNA (piRNAs), it is now pertinent to develop efficient tools dedicated towards piRNA analysis. We have developed a novel cluster prediction tool called PILFER (PIrna cLuster FindER), which can accurately predict piRNA clusters from small RNA sequencing data. PILFER is an open source, easy to use tool, and can be executed even on a personal computer with minimum resources. It uses a sliding-window mechanism by integrating the expression of the reads along with the spatial information to predict the piRNA clusters. We have additionally defined a piRNA analysis pipeline incorporating PILFER to detect and annotate piRNAs and their clusters from raw small RNA sequencing data and implemented it on publicly available data from healthy germline and somatic tissues. We compared PILFER with other existing piRNA cluster prediction tools and found it to be statistically more accurate and superior in many aspects such as the robustness of PILFER clusters is higher and memory efficiency is more. Overall, PILFER provides a fast and accurate solution to piRNA cluster prediction. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Network clustering coefficient approach to DNA sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gerhardt, Guenther J.L. [Universidade Federal do Rio Grande do Sul-Hospital de Clinicas de Porto Alegre, Rua Ramiro Barcelos 2350/sala 2040/90035-003 Porto Alegre (Brazil); Departamento de Fisica e Quimica da Universidade de Caxias do Sul, Rua Francisco Getulio Vargas 1130, 95001-970 Caxias do Sul (Brazil); Lemke, Ney [Programa Interdisciplinar em Computacao Aplicada, Unisinos, Av. Unisinos, 950, 93022-000 Sao Leopoldo, RS (Brazil); Corso, Gilberto [Departamento de Biofisica e Farmacologia, Centro de Biociencias, Universidade Federal do Rio Grande do Norte, Campus Universitario, 59072 970 Natal, RN (Brazil)]. E-mail: corso@dfte.ufrn.br

    2006-05-15

    In this work we propose an alternative DNA sequence analysis tool based on graph theoretical concepts. The methodology investigates the path topology of an organism genome through a triplet network. In this network, triplets in DNA sequence are vertices and two vertices are connected if they occur juxtaposed on the genome. We characterize this network topology by measuring the clustering coefficient. We test our methodology against two main bias: the guanine-cytosine (GC) content and 3-bp (base pairs) periodicity of DNA sequence. We perform the test constructing random networks with variable GC content and imposed 3-bp periodicity. A test group of some organisms is constructed and we investigate the methodology in the light of the constructed random networks. We conclude that the clustering coefficient is a valuable tool since it gives information that is not trivially contained in 3-bp periodicity neither in the variable GC content.

  16. Robustness in cluster analysis in the presence of anomalous observations

    NARCIS (Netherlands)

    Zhuk, EE

    Cluster analysis of multivariate observations in the presence of "outliers" (anomalous observations) in a sample is studied. The expected (mean) fraction of erroneous decisions for the decision rule is computed analytically by minimizing the intraclass scatter. A robust decision rule (stable to

  17. Assessment of repeatability of composition of perfumed waters by high-performance liquid chromatography combined with numerical data analysis based on cluster analysis (HPLC UV/VIS - CA).

    Science.gov (United States)

    Ruzik, L; Obarski, N; Papierz, A; Mojski, M

    2015-06-01

    High-performance liquid chromatography (HPLC) with UV/VIS spectrophotometric detection combined with the chemometric method of cluster analysis (CA) was used for the assessment of repeatability of composition of nine types of perfumed waters. In addition, the chromatographic method of separating components of the perfume waters under analysis was subjected to an optimization procedure. The chromatograms thus obtained were used as sources of data for the chemometric method of cluster analysis (CA). The result was a classification of a set comprising 39 perfumed water samples with a similar composition at a specified level of probability (level of agglomeration). A comparison of the classification with the manufacturer's declarations reveals a good degree of consistency and demonstrates similarity between samples in different classes. A combination of the chromatographic method with cluster analysis (HPLC UV/VIS - CA) makes it possible to quickly assess the repeatability of composition of perfumed waters at selected levels of probability. © 2014 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  18. A Link-Based Cluster Ensemble Approach For Improved Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    P.Balaji

    2015-01-01

    Full Text Available Abstract It is difficult from possibilities to select a most suitable effective way of clustering algorithm and its dataset for a defined set of gene expression data because we have a huge number of ways and huge number of gene expressions. At present many researchers are preferring to use hierarchical clustering in different forms this is no more totally optimal. Cluster ensemble research can solve this type of problem by automatically merging multiple data partitions from a wide range of different clusterings of any dimensions to improve both the quality and robustness of the clustering result. But we have many existing ensemble approaches using an association matrix to condense sample-cluster and co-occurrence statistics and relations within the ensemble are encapsulated only at raw level while the existing among clusters are totally discriminated. Finding these missing associations can greatly expand the capability of those ensemble methodologies for microarray data clustering. We propose general K-means cluster ensemble approach for the clustering of general categorical data into required number of partitions.

  19. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future......Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene....... This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide...

  20. Planck early results. XII. Cluster Sunyaev-Zeldovich optical scaling relations

    DEFF Research Database (Denmark)

    Poutanen, T.; Natoli, P.; Polenta, G.

    2011-01-01

    We present the Sunyaev-Zeldovich (SZ) signal-to-richness scaling relation (Y500 - N200) for the MaxBCG cluster catalogue. Employing a multi-frequency matched filter on the Planck sky maps, we measure the SZ signal for each cluster by adapting the filter according to weak-lensing calibrated mass-r...

  1. Planck 2015 results: XXIV. Cosmology from Sunyaev-Zeldovich cluster counts

    DEFF Research Database (Denmark)

    Ade, P. A R; Aghanim, N.; Arnaud, M.

    2016-01-01

    We present cluster counts and corresponding cosmological constraints from the Planck full mission data set. Our catalogue consists of 439 clusters detected via their Sunyaev-Zeldovich (SZ) signal down to a signal-to-noise ratio of 6, and is more than a factor of 2 larger than the 2013 Planck clus...

  2. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  3. Detecting space-time disease clusters with arbitrary shapes and sizes using a co-clustering approach

    Directory of Open Access Journals (Sweden)

    Sami Ullah

    2017-11-01

    Full Text Available Ability to detect potential space-time clusters in spatio-temporal data on disease occurrences is necessary for conducting surveillance and implementing disease prevention policies. Most existing techniques use geometrically shaped (circular, elliptical or square scanning windows to discover disease clusters. In certain situations, where the disease occurrences tend to cluster in very irregularly shaped areas, these algorithms are not feasible in practise for the detection of space-time clusters. To address this problem, a new algorithm is proposed, which uses a co-clustering strategy to detect prospective and retrospective space-time disease clusters with no restriction on shape and size. The proposed method detects space-time disease clusters by tracking the changes in space–time occurrence structure instead of an in-depth search over space. This method was utilised to detect potential clusters in the annual and monthly malaria data in Khyber Pakhtunkhwa Province, Pakistan from 2012 to 2016 visualising the results on a heat map. The results of the annual data analysis showed that the most likely hotspot emerged in three sub-regions in the years 2013-2014. The most likely hotspots in monthly data appeared in the month of July to October in each year and showed a strong periodic trend.

  4. Dark Energy Survey Year 1 Results: Methodology and Projections for Joint Analysis of Galaxy Clustering, Galaxy Lensing, and CMB Lensing Two-point Functions

    Energy Technology Data Exchange (ETDEWEB)

    Giannantonio, T.; et al.

    2018-02-14

    Optical imaging surveys measure both the galaxy density and the gravitational lensing-induced shear fields across the sky. Recently, the Dark Energy Survey (DES) collaboration used a joint fit to two-point correlations between these observables to place tight constraints on cosmology (DES Collaboration et al. 2017). In this work, we develop the methodology to extend the DES Collaboration et al. (2017) analysis to include cross-correlations of the optical survey observables with gravitational lensing of the cosmic microwave background (CMB) as measured by the South Pole Telescope (SPT) and Planck. Using simulated analyses, we show how the resulting set of five two-point functions increases the robustness of the cosmological constraints to systematic errors in galaxy lensing shear calibration. Additionally, we show that contamination of the SPT+Planck CMB lensing map by the thermal Sunyaev-Zel'dovich effect is a potentially large source of systematic error for two-point function analyses, but show that it can be reduced to acceptable levels in our analysis by masking clusters of galaxies and imposing angular scale cuts on the two-point functions. The methodology developed here will be applied to the analysis of data from the DES, the SPT, and Planck in a companion work.

  5. Unequal cluster sizes in stepped-wedge cluster randomised trials: a systematic review.

    Science.gov (United States)

    Kristunas, Caroline; Morris, Tom; Gray, Laura

    2017-11-15

    To investigate the extent to which cluster sizes vary in stepped-wedge cluster randomised trials (SW-CRT) and whether any variability is accounted for during the sample size calculation and analysis of these trials. Any, not limited to healthcare settings. Any taking part in an SW-CRT published up to March 2016. The primary outcome is the variability in cluster sizes, measured by the coefficient of variation (CV) in cluster size. Secondary outcomes include the difference between the cluster sizes assumed during the sample size calculation and those observed during the trial, any reported variability in cluster sizes and whether the methods of sample size calculation and methods of analysis accounted for any variability in cluster sizes. Of the 101 included SW-CRTs, 48% mentioned that the included clusters were known to vary in size, yet only 13% of these accounted for this during the calculation of the sample size. However, 69% of the trials did use a method of analysis appropriate for when clusters vary in size. Full trial reports were available for 53 trials. The CV was calculated for 23 of these: the median CV was 0.41 (IQR: 0.22-0.52). Actual cluster sizes could be compared with those assumed during the sample size calculation for 14 (26%) of the trial reports; the cluster sizes were between 29% and 480% of that which had been assumed. Cluster sizes often vary in SW-CRTs. Reporting of SW-CRTs also remains suboptimal. The effect of unequal cluster sizes on the statistical power of SW-CRTs needs further exploration and methods appropriate to studies with unequal cluster sizes need to be employed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  6. THE SWIFT AGN AND CLUSTER SURVEY. II. CLUSTER CONFIRMATION WITH SDSS DATA

    International Nuclear Information System (INIS)

    Griffin, Rhiannon D.; Dai, Xinyu; Kochanek, Christopher S.; Bregman, Joel N.

    2016-01-01

    We study 203 (of 442) Swift AGN and Cluster Survey extended X-ray sources located in the SDSS DR8 footprint to search for galaxy over-densities in three-dimensional space using SDSS galaxy photometric redshifts and positions near the Swift cluster candidates. We find 104 Swift clusters with a >3σ galaxy over-density. The remaining targets are potentially located at higher redshifts and require deeper optical follow-up observations for confirmation as galaxy clusters. We present a series of cluster properties including the redshift, brightest cluster galaxy (BCG) magnitude, BCG-to-X-ray center offset, optical richness, and X-ray luminosity. We also detect red sequences in ∼85% of the 104 confirmed clusters. The X-ray luminosity and optical richness for the SDSS confirmed Swift clusters are correlated and follow previously established relations. The distribution of the separations between the X-ray centroids and the most likely BCG is also consistent with expectation. We compare the observed redshift distribution of the sample with a theoretical model, and find that our sample is complete for z ≲ 0.3 and is still 80% complete up to z ≃ 0.4, consistent with the SDSS survey depth. These analysis results suggest that our Swift cluster selection algorithm has yielded a statistically well-defined cluster sample for further study of cluster evolution and cosmology. We also match our SDSS confirmed Swift clusters to existing cluster catalogs, and find 42, 23, and 1 matches in optical, X-ray, and Sunyaev–Zel’dovich catalogs, respectively, and so the majority of these clusters are new detections

  7. Employing post-DEA cross-evaluation and cluster analysis in a sample of Greek NHS hospitals.

    Science.gov (United States)

    Flokou, Angeliki; Kontodimopoulos, Nick; Niakas, Dimitris

    2011-10-01

    To increase Data Envelopment Analysis (DEA) discrimination of efficient Decision Making Units (DMUs), by complementing "self-evaluated" efficiencies with "peer-evaluated" cross-efficiencies and, based on these results, to classify the DMUs using cluster analysis. Healthcare, which is deprived of such studies, was chosen as the study area. The sample consisted of 27 small- to medium-sized (70-500 beds) NHS general hospitals distributed throughout Greece, in areas where they are the sole NHS representatives. DEA was performed on 2005 data collected from the Ministry of Health and the General Secretariat of the National Statistical Service. Three inputs -hospital beds, physicians and other health professionals- and three outputs -case-mix adjusted hospitalized cases, surgeries and outpatient visits- were included in input-oriented, constant-returns-to-scale (CRS) and variable-returns-to-scale (VRS) models. In a second stage (post-DEA), aggressive and benevolent cross-efficiency formulations and clustering were employed, to validate (or not) the initial DEA scores. The "maverick index" was used to sort the peer-appraised hospitals. All analyses were performed using custom-made software. Ten benchmark hospitals were identified by DEA, but using the aggressive and benevolent formulations showed that two and four of them respectively were at the lower end of the maverick index list. On the other hand, only one 100% efficient (self-appraised) hospital was at the higher end of the list, using either formulation. Cluster analysis produced a hierarchical "tree" structure which dichotomized the hospitals in accordance to the cross-evaluation results, and provided insight on the two-dimensional path to improving efficiency. This is, to our awareness, the first study in the healthcare domain to employ both of these post-DEA techniques (cross efficiency and clustering) at the hospital (i.e. micro) level. The potential benefit for decision-makers is the capability to examine high

  8. Genesis of cluster associations of enterprises

    Directory of Open Access Journals (Sweden)

    Pulina Tetyana V.

    2013-03-01

    Full Text Available The goal of the article is the study of genesis of creation of cluster associations of enterprises. It considers genesis of cluster definitions. It shows and analyses components that define the “cluster” concept. Researchers from many countries offer a significant number of definitions of the “cluster” term specifically in the economic direction, but there is no single generally accepted definition as of today. This fact is the result of a significant diversity of cluster structures. The article conducts a comparative analysis of classifications of cluster associations of enterprises. It identifies advantages and shortcomings of the cluster approach both from the position of an enterprise and from the position of a regional economy administration. The article marks out specific features of the life cycle of cluster associations of enterprises, which consists of the preparatory stage and stage of commercialisation. Majority of studies consider the preparatory stage and the stage of commercialisation, which consists of the following stages: entering market with a common brand, growth, maturity and crisis – is, practically, not considered. Taking into account the fact that the main result of cluster activity is the synergetic effect from mutually beneficial co-operation and activity results facilitate ensuring competitiveness of cluster enterprises, regional and national economies, the author gives own definition of a cluster.

  9. Comparison of Outputs for Variable Combinations Used in Cluster Analysis on Polarmetric Imagery

    National Research Council Canada - National Science Library

    Petre, Melinda

    2008-01-01

    .... More specifically, two techniques, Cluster Analysis (CA) and Principle Component Analysis (PCA) can be combined to process Stoke s imagery by distinguishing between pixels, and producing groups of pixels with similar characteristics...

  10. Examining the effectiveness of discriminant function analysis and cluster analysis in species identification of male field crickets based on their calling songs.

    Directory of Open Access Journals (Sweden)

    Ranjana Jaiswara

    Full Text Available Traditional taxonomy based on morphology has often failed in accurate species identification owing to the occurrence of cryptic species, which are reproductively isolated but morphologically identical. Molecular data have thus been used to complement morphology in species identification. The sexual advertisement calls in several groups of acoustically communicating animals are species-specific and can thus complement molecular data as non-invasive tools for identification. Several statistical tools and automated identifier algorithms have been used to investigate the efficiency of acoustic signals in species identification. Despite a plethora of such methods, there is a general lack of knowledge regarding the appropriate usage of these methods in specific taxa. In this study, we investigated the performance of two commonly used statistical methods, discriminant function analysis (DFA and cluster analysis, in identification and classification based on acoustic signals of field cricket species belonging to the subfamily Gryllinae. Using a comparative approach we evaluated the optimal number of species and calling song characteristics for both the methods that lead to most accurate classification and identification. The accuracy of classification using DFA was high and was not affected by the number of taxa used. However, a constraint in using discriminant function analysis is the need for a priori classification of songs. Accuracy of classification using cluster analysis, which does not require a priori knowledge, was maximum for 6-7 taxa and decreased significantly when more than ten taxa were analysed together. We also investigated the efficacy of two novel derived acoustic features in improving the accuracy of identification. Our results show that DFA is a reliable statistical tool for species identification using acoustic signals. Our results also show that cluster analysis of acoustic signals in crickets works effectively for species

  11. Calibrating the Planck cluster mass scale with CLASH

    Science.gov (United States)

    Penna-Lima, M.; Bartlett, J. G.; Rozo, E.; Melin, J.-B.; Merten, J.; Evrard, A. E.; Postman, M.; Rykoff, E.

    2017-08-01

    We determine the mass scale of Planck galaxy clusters using gravitational lensing mass measurements from the Cluster Lensing And Supernova survey with Hubble (CLASH). We have compared the lensing masses to the Planck Sunyaev-Zeldovich (SZ) mass proxy for 21 clusters in common, employing a Bayesian analysis to simultaneously fit an idealized CLASH selection function and the distribution between the measured observables and true cluster mass. We used a tiered analysis strategy to explicitly demonstrate the importance of priors on weak lensing mass accuracy. In the case of an assumed constant bias, bSZ, between true cluster mass, M500, and the Planck mass proxy, MPL, our analysis constrains 1-bSZ = 0.73 ± 0.10 when moderate priors on weak lensing accuracy are used, including a zero-mean Gaussian with standard deviation of 8% to account for possible bias in lensing mass estimations. Our analysis explicitly accounts for possible selection bias effects in this calibration sourced by the CLASH selection function. Our constraint on the cluster mass scale is consistent with recent results from the Weighing the Giants program and the Canadian Cluster Comparison Project. It is also consistent, at 1.34σ, with the value needed to reconcile the Planck SZ cluster counts with Planck's base ΛCDM model fit to the primary cosmic microwave background anisotropies.

  12. Parkinson's Disease Subtypes Identified from Cluster Analysis of Motor and Non-motor Symptoms.

    Science.gov (United States)

    Mu, Jesse; Chaudhuri, Kallol R; Bielza, Concha; de Pedro-Cuesta, Jesus; Larrañaga, Pedro; Martinez-Martin, Pablo

    2017-01-01

    Parkinson's disease is now considered a complex, multi-peptide, central, and peripheral nervous system disorder with considerable clinical heterogeneity. Non-motor symptoms play a key role in the trajectory of Parkinson's disease, from prodromal premotor to end stages. To understand the clinical heterogeneity of Parkinson's disease, this study used cluster analysis to search for subtypes from a large, multi-center, international, and well-characterized cohort of Parkinson's disease patients across all motor stages, using a combination of cardinal motor features (bradykinesia, rigidity, tremor, axial signs) and, for the first time, specific validated rater-based non-motor symptom scales. Two independent international cohort studies were used: (a) the validation study of the Non-Motor Symptoms Scale ( n = 411) and (b) baseline data from the global Non-Motor International Longitudinal Study ( n = 540). k -means cluster analyses were performed on the non-motor and motor domains (domains clustering) and the 30 individual non-motor symptoms alone (symptoms clustering), and hierarchical agglomerative clustering was performed to group symptoms together. Four clusters are identified from the domains clustering supporting previous studies: mild, non-motor dominant, motor-dominant, and severe. In addition, six new smaller clusters are identified from the symptoms clustering, each characterized by clinically-relevant non-motor symptoms. The clusters identified in this study present statistical confirmation of the increasingly important role of non-motor symptoms (NMS) in Parkinson's disease heterogeneity and take steps toward subtype-specific treatment packages.

  13. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  14. Local wavelet correlation: applicationto timing analysis of multi-satellite CLUSTER data

    Directory of Open Access Journals (Sweden)

    J. Soucek

    2004-12-01

    Full Text Available Multi-spacecraft space observations, such as those of CLUSTER, can be used to infer information about local plasma structures by exploiting the timing differences between subsequent encounters of these structures by individual satellites. We introduce a novel wavelet-based technique, the Local Wavelet Correlation (LWC, which allows one to match the corresponding signatures of large-scale structures in the data from multiple spacecraft and determine the relative time shifts between the crossings. The LWC is especially suitable for analysis of strongly non-stationary time series, where it enables one to estimate the time lags in a more robust and systematic way than ordinary cross-correlation techniques. The technique, together with its properties and some examples of its application to timing analysis of bow shock and magnetopause crossing observed by CLUSTER, are presented. We also compare the performance and reliability of the technique with classical discontinuity analysis methods. Key words. Radio science (signal processing – Space plasma physics (discontinuities; instruments and techniques

  15. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

    2006-01-01

    Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...

  16. a Web-Based Interactive Platform for Co-Clustering Spatio-Temporal Data

    Science.gov (United States)

    Wu, X.; Poorthuis, A.; Zurita-Milla, R.; Kraak, M.-J.

    2017-09-01

    Since current studies on clustering analysis mainly focus on exploring spatial or temporal patterns separately, a co-clustering algorithm is utilized in this study to enable the concurrent analysis of spatio-temporal patterns. To allow users to adopt and adapt the algorithm for their own analysis, it is integrated within the server side of an interactive web-based platform. The client side of the platform, running within any modern browser, is a graphical user interface (GUI) with multiple linked visualizations that facilitates the understanding, exploration and interpretation of the raw dataset and co-clustering results. Users can also upload their own datasets and adjust clustering parameters within the platform. To illustrate the use of this platform, an annual temperature dataset from 28 weather stations over 20 years in the Netherlands is used. After the dataset is loaded, it is visualized in a set of linked visualizations: a geographical map, a timeline and a heatmap. This aids the user in understanding the nature of their dataset and the appropriate selection of co-clustering parameters. Once the dataset is processed by the co-clustering algorithm, the results are visualized in the small multiples, a heatmap and a timeline to provide various views for better understanding and also further interpretation. Since the visualization and analysis are integrated in a seamless platform, the user can explore different sets of co-clustering parameters and instantly view the results in order to do iterative, exploratory data analysis. As such, this interactive web-based platform allows users to analyze spatio-temporal data using the co-clustering method and also helps the understanding of the results using multiple linked visualizations.

  17. Behavioural graded activity results in better exercise adherence and more physical activity than usual care in people with osteoarthritis: a cluster-randomised trial

    NARCIS (Netherlands)

    Pisters, M.F.; Veenhof, C.; de Bakker, D.H.; Schellevis, F.G.; Dekker, J.

    2010-01-01

    Question: Does behavioural graded activity result in better exercise adherence and more physical activity than usual care in people with osteoarthritis of the hip or knee? Design: Analysis of secondary outcomes of a cluster-randomised trial with concealed allocation, assessor blinding, and

  18. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.

    2012-01-01

    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  19. IDENTIFYING REGIONAL CLUSTER MANAGEMENT POTENTIALS EMPIRICAL RESULTS FROM THREE NORTH RHINEWESTPHALIAN REGIONS

    OpenAIRE

    Rudiger Hamm; Christiane Goebel

    2010-01-01

    The development and support of clusters is an issue that became quite popular by players dealing with regional economic policy. But before a regional development agency can start to implement a cluster-oriented strategy there a two question that have to be answered: 1. What are the regional fields of competence (cluster potentials) that fulfill the requirements for a cluster-oriented regional development policy? 2. If you find such regional fields of competence, are the enterprises willing to...

  20. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    International Nuclear Information System (INIS)

    Andersen, Erlend K. F.; Kristensen, Gunnar B.; Lyng, Heidi; Malinen, Eirik

    2011-01-01

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K trans and u e , were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  1. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    Energy Technology Data Exchange (ETDEWEB)

    Andersen, Erlend K. F. (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)), e-mail: eirik.malinen@fys.uio.no; Kristensen, Gunnar B. (Section for Gynaecological Oncology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Lyng, Heidi (Dept. of Radiation Biology, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway)); Malinen, Eirik (Dept. of Medical Physics, The Norwegian Radium Hospital, Oslo Univ. Hospital, Oslo (Norway); Dept. of Physics, Univ. of Oslo, Oslo (Norway))

    2011-08-15

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, Ktrans and u{sub e}, were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  2. A novel model for Time-Series Data Clustering Based on piecewise SVD and BIRCH for Stock Data Analysis on Hadoop Platform

    Directory of Open Access Journals (Sweden)

    Ibgtc Bowala

    2017-06-01

    Full Text Available With the rapid growth of financial markets, analyzers are paying more attention on predictions. Stock data are time series data, with huge amounts. Feasible solution for handling the increasing amount of data is to use a cluster for parallel processing, and Hadoop parallel computing platform is a typical representative. There are various statistical models for forecasting time series data, but accurate clusters are a pre-requirement. Clustering analysis for time series data is one of the main methods for mining time series data for many other analysis processes. However, general clustering algorithms cannot perform clustering for time series data because series data has a special structure and a high dimensionality has highly co-related values due to high noise level. A novel model for time series clustering is presented using BIRCH, based on piecewise SVD, leading to a novel dimension reduction approach. Highly co-related features are handled using SVD with a novel approach for dimensionality reduction in order to keep co-related behavior optimal and then use BIRCH for clustering. The algorithm is a novel model that can handle massive time series data. Finally, this new model is successfully applied to real stock time series data of Yahoo finance with satisfactory results.

  3. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.

    Science.gov (United States)

    Mo, Qianxing; Shen, Ronglai; Guo, Cui; Vannucci, Marina; Chan, Keith S; Hilsenbeck, Susan G

    2018-01-01

    Identification of clinically relevant tumor subtypes and omics signatures is an important task in cancer translational research for precision medicine. Large-scale genomic profiling studies such as The Cancer Genome Atlas (TCGA) Research Network have generated vast amounts of genomic, transcriptomic, epigenomic, and proteomic data. While these studies have provided great resources for researchers to discover clinically relevant tumor subtypes and driver molecular alterations, there are few computationally efficient methods and tools for integrative clustering analysis of these multi-type omics data. Therefore, the aim of this article is to develop a fully Bayesian latent variable method (called iClusterBayes) that can jointly model omics data of continuous and discrete data types for identification of tumor subtypes and relevant omics features. Specifically, the proposed method uses a few latent variables to capture the inherent structure of multiple omics data sets to achieve joint dimension reduction. As a result, the tumor samples can be clustered in the latent variable space and relevant omics features that drive the sample clustering are identified through Bayesian variable selection. This method significantly improve on the existing integrative clustering method iClusterPlus in terms of statistical inference and computational speed. By analyzing TCGA and simulated data sets, we demonstrate the excellent performance of the proposed method in revealing clinically meaningful tumor subtypes and driver omics features. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  4. Clustering analysis of water distribution systems: identifying critical components and community impacts.

    Science.gov (United States)

    Diao, K; Farmani, R; Fu, G; Astaraie-Imani, M; Ward, S; Butler, D

    2014-01-01

    Large water distribution systems (WDSs) are networks with both topological and behavioural complexity. Thereby, it is usually difficult to identify the key features of the properties of the system, and subsequently all the critical components within the system for a given purpose of design or control. One way is, however, to more explicitly visualize the network structure and interactions between components by dividing a WDS into a number of clusters (subsystems). Accordingly, this paper introduces a clustering strategy that decomposes WDSs into clusters with stronger internal connections than external connections. The detected cluster layout is very similar to the community structure of the served urban area. As WDSs may expand along with urban development in a community-by-community manner, the correspondingly formed distribution clusters may reveal some crucial configurations of WDSs. For verification, the method is applied to identify all the critical links during firefighting for the vulnerability analysis of a real-world WDS. Moreover, both the most critical pipes and clusters are addressed, given the consequences of pipe failure. Compared with the enumeration method, the method used in this study identifies the same group of the most critical components, and provides similar criticality prioritizations of them in a more computationally efficient time.

  5. Clustering analysis of malware behavior using Self Organizing Map

    DEFF Research Database (Denmark)

    Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    For the time being, malware behavioral classification is performed by means of Anti-Virus (AV) generated labels. The paper investigates the inconsistencies associated with current practices by evaluating the identified differences between current vendors. In this paper we rely on Self Organizing...... Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... approach with the labels collected from 57 Antivirus vendors using VirusTotal. Upon evaluating the results, the paper concludes on shortcomings of relying on AV vendors for labeling malware samples. In order to solve the problem, a cluster-based classification is proposed, which should provide more...

  6. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  7. Formation of stable products from cluster-cluster collisions

    International Nuclear Information System (INIS)

    Alamanova, Denitsa; Grigoryan, Valeri G; Springborg, Michael

    2007-01-01

    The formation of stable products from copper cluster-cluster collisions is investigated by using classical molecular-dynamics simulations in combination with an embedded-atom potential. The dependence of the product clusters on impact energy, relative orientation of the clusters, and size of the clusters is studied. The structures and total energies of the product clusters are analysed and compared with those of the colliding clusters before impact. These results, together with the internal temperature, are used in obtaining an increased understanding of cluster fusion processes

  8. Clustering Game Behavior Data

    DEFF Research Database (Denmark)

    Bauckhage, C.; Drachen, Anders; Sifa, Rafet

    2015-01-01

    of the causes, the proliferation of behavioral data poses the problem of how to derive insights therefrom. Behavioral data sets can be large, time-dependent and high-dimensional. Clustering offers a way to explore such data and to discover patterns that can reduce the overall complexity of the data. Clustering...... and other techniques for player profiling and play style analysis have, therefore, become popular in the nascent field of game analytics. However, the proper use of clustering techniques requires expertise and an understanding of games is essential to evaluate results. With this paper, we address game data...... scientists and present a review and tutorial focusing on the application of clustering techniques to mine behavioral game data. Several algorithms are reviewed and examples of their application shown. Key topics such as feature normalization are discussed and open problems in the context of game analytics...

  9. Parallelization and scheduling of data intensive particle physics analysis jobs on clusters of PCs

    CERN Document Server

    Ponce, S

    2004-01-01

    Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle physics analysis applications on computer clusters. Particle physics analysis jobs require the analysis of tens of thousands of particle collision events, each event requiring typically 200ms processing time and 600KB of data. Many jobs are launched concurrently by a large number of physicists. At a first view, particle physics jobs seem to be easy to parallelize, since particle collision events can be processed independently one from another. However, since large amounts of data need to be accessed, the real challenge resides in making an efficient use of the underlying computing resources. We propose several job parallelization and scheduling policies aiming at reducing job processing times and at increasing the sustainable load of a cluster server. Since particle collision events are usually reused by several jobs, cache based job splitting strategies considerably increase cluster utilization and reduce job ...

  10. K-Line Patterns’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

    Directory of Open Access Journals (Sweden)

    Lv Tao

    2017-01-01

    Full Text Available Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1 the predictive power of a pattern varies a great deal for different shapes and (2 each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

  11. HDclassif : An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Laurent Berge

    2012-01-01

    Full Text Available This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mixture model which combines the idea of dimension reduction and model constraints on the covariance matrices. The supervised classification method using this parametrization is called high dimensional discriminant analysis (HDDA. In a similar manner, the associated clustering method iscalled high dimensional data clustering (HDDC and uses the expectation-maximization algorithm for inference. In order to correctly t the data, both methods estimate the specific subspace and the intrinsic dimension of the groups. Due to the constraints on the covariance matrices, the number of parameters to estimate is significantly lower than other model-based methods and this allows the methods to be stable and efficient in high dimensions. Two introductory examples illustrated with R codes allow the user to discover the hdda and hddc functions. Experiments on simulated and real datasets also compare HDDC and HDDA with existing classification methods on high-dimensional datasets. HDclassif is a free software and distributed under the general public license, as part of the R software project.

  12. X-ray and optical substructures of the DAFT/FADA survey clusters

    Science.gov (United States)

    Guennou, L.; Durret, F.; Adami, C.; Lima Neto, G. B.

    2013-04-01

    We have undertaken the DAFT/FADA survey with the double aim of setting constraints on dark energy based on weak lensing tomography and of obtaining homogeneous and high quality data for a sample of 91 massive clusters in the redshift range 0.4-0.9 for which there were HST archive data. We have analysed the XMM-Newton data available for 42 of these clusters to derive their X-ray temperatures and luminosities and search for substructures. Out of these, a spatial analysis was possible for 30 clusters, but only 23 had deep enough X-ray data for a really robust analysis. This study was coupled with a dynamical analysis for the 26 clusters having at least 30 spectroscopic galaxy redshifts in the cluster range. Altogether, the X-ray sample of 23 clusters and the optical sample of 26 clusters have 14 clusters in common. We present preliminary results on the coupled X-ray and dynamical analyses of these 14 clusters.

  13. Cluster Analysis of Flow Cytometric List Mode Data on a Personal Computer

    NARCIS (Netherlands)

    Bakker Schut, Tom C.; Bakker schut, T.C.; de Grooth, B.G.; Greve, Jan

    1993-01-01

    A cluster analysis algorithm, dedicated to analysis of flow cytometric data is described. The algorithm is written in Pascal and implemented on an MS-DOS personal computer. It uses k-means, initialized with a large number of seed points, followed by a modified nearest neighbor technique to reduce

  14. Function analysis of 5'-UTR of the cellulosomal xyl-doc cluster in Clostridium papyrosolvens.

    Science.gov (United States)

    Zou, Xia; Ren, Zhenxing; Wang, Na; Cheng, Yin; Jiang, Yuanyuan; Wang, Yan; Xu, Chenggang

    2018-01-01

    Anaerobic, mesophilic, and cellulolytic Clostridium papyrosolvens produces an efficient cellulolytic extracellular complex named cellulosome that hydrolyzes plant cell wall polysaccharides into simple sugars. Its genome harbors two long cellulosomal clusters: cip - cel operon encoding major cellulosome components (including scaffolding) and xyl - doc gene cluster encoding hemicellulases. Compared with works on cip - cel operon, there are much fewer studies on xyl - doc mainly due to its rare location in cellulolytic clostridia. Sequence analysis of xyl - doc revealed that it harbors a 5' untranslated region (5'-UTR) which potentially plays a role in the regulation of downstream gene expression. Here, we analyzed the function of 5'-UTR of xyl - doc cluster in C. papyrosolvens in vivo via transformation technology developed in this study. In this study, we firstly developed an electrotransformation method for C. papyrosolvens DSM 2782 before the analysis of 5'-UTR of xyl - doc cluster. In the optimized condition, a field with an intensity of 7.5-9.0 kV/cm was applied to a cuvette (0.2 cm gap) containing a mixture of plasmid and late cell suspended in exponential phase to form a 5 ms pulse in a sucrose-containing buffer. Afterwards, the putative promoter and the 5'-UTR of xyl - doc cluster were determined by sequence alignment. It is indicated that xyl - doc possesses a long conservative 5'-UTR with a complex secondary structure encompassing at least two perfect stem-loops which are potential candidates for controlling the transcriptional termination. In the last step, we employed an oxygen-independent flavin-based fluorescent protein (FbFP) as a quantitative reporter to analyze promoter activity and 5'-UTR function in vivo. It revealed that 5'-UTR significantly blocked transcription of downstream genes, but corn stover can relieve its suppression. In the present study, our results demonstrated that 5'-UTR of the cellulosomal xyl - doc cluster blocks the

  15. Cluster observations of the high-latitude magnetopause and cusp: initial results from the CIS ion instruments

    Directory of Open Access Journals (Sweden)

    J. M. Bosqued

    2001-09-01

    Full Text Available Launched on an elliptical high inclination orbit (apogee: 19.6 RE since January 2001 the Cluster satellites have been conducting the first detailed three-dimensional studies of the high-latitude dayside magnetosphere, including the exterior cusp, neighbouring boundary layers and magnetopause regions. Cluster satellites carry the CIS ion spectrometers that provide high-precision, 3D distributions of low-energy (<35 keV/e ions every 4 s. This paper presents the first two observations of the cusp and/or magnetopause behaviour made under different interplanetary magnetic field (IMF conditions. Flow directions, 3D distribution functions, density profiles and ion composition profiles are analyzed to demonstrate the high variability of high-latitude regions. In the first crossing analyzed (26 January 2001, dusk side, IMF-BZ < 0, multiple, isolated boundary layer, magnetopause and magnetosheath encounters clearly occurred on a quasi-steady basis for ~ 2 hours. CIS ion instruments show systematic accelerated flows in the current layer and adjacent boundary layers on the Earthward side of the magnetopause. Multi-point analysis of the magnetopause, combining magnetic and plasma data from the four Cluster spacecraft, demonstrates that oscillatory outward-inward motions occur with a normal speed of the order of ± 40 km/s; the thickness of the high-latitude current layer is evaluated to be of the order of 900–1000 km. Alfvénic accelerated flows and D-shaped distributions are convincing signatures of a magnetic reconnection occurring equatorward of the Cluster satellites. Moreover, the internal magnetic and plasma structure of a flux transfer event (FTE is analyzed in detail; its size along the magnetopause surface is ~ 12 000 km and it convects with a velocity of ~ 200 km/s. The second event analyzed (2 February 2001 corresponds to the first Cluster pass within the cusp when the IMF-BZ component was northward directed. The analysis of relevant CIS plasma

  16. Cluster analysis in systems of magnetic spheres and cubes

    Energy Technology Data Exchange (ETDEWEB)

    Pyanzina, E.S., E-mail: elena.pyanzina@urfu.ru [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Gudkova, A.V. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); Donaldson, J.G. [University of Vienna, Sensengasse 8, Vienna (Austria); Kantorovich, S.S. [Ural Federal University, Lenin Av. 51, Ekaterinburg (Russian Federation); University of Vienna, Sensengasse 8, Vienna (Austria)

    2017-06-01

    In the present work we use molecular dynamics simulations and graph-theory based cluster analysis to compare self-assembly in systems of magnetic spheres, and cubes where the dipole moment is oriented along the side of the cube in the [001] crystallographic direction. We show that under the same conditions cubes aggregate far less than their spherical counterparts. This difference can be explained in terms of the volume of phase space in which the formation of the bond is thermodynamically advantageous. It follows that this volume is much larger for a dipolar sphere than for a dipolar cube. - Highlights: • A comparison of the degree of self-assembly in systems of magnetic spheres and cubes. • Spheres are more likely to form larger clusters than cubes. • Differences in microstructure will manifest in the magnetic response of each system.

  17. Crouch gait patterns defined using k-means cluster analysis are related to underlying clinical pathology.

    Science.gov (United States)

    Rozumalski, Adam; Schwartz, Michael H

    2009-08-01

    In this study a gait classification method was developed and applied to subjects with Cerebral palsy who walk with excessive knee flexion at initial contact. Sagittal plane gait data, simplified using the gait features method, is used as input into a k-means cluster analysis to determine homogeneous groups. Several clinical domains were explored to determine if the clusters are related to underlying pathology. These domains included age, joint range-of-motion, strength, selective motor control, and spasticity. Principal component analysis is used to determine one overall score for each of the multi-joint domains (strength, selective motor control, and spasticity). The current study shows that there are five clusters among children with excessive knee flexion at initial contact. These clusters were labeled, in order of increasing gait pathology: (1) mild crouch with mild equinus, (2) moderate crouch, (3) moderate crouch with anterior pelvic tilt, (4) moderate crouch with equinus, and (5) severe crouch. Further analysis showed that age, range-of-motion, strength, selective motor control, and spasticity were significantly different between the clusters (p<0.001). The general tendency was for the clinical domains to worsen as gait pathology increased. This new classification tool can be used to define homogeneous groups of subjects in crouch gait, which can help guide treatment decisions and outcomes assessment.

  18. Use of the cluster analysis for assessment of economic situation of an enterprise

    Directory of Open Access Journals (Sweden)

    Martin Maršík

    2013-01-01

    Full Text Available The aim of the paper was to discuss the disadvantages of enterprises in dry areas compared to enterprises farming in similar production area outside a rain shadow. The analysis was based on the sample of 45 enterprises; twelve of which farming in the area of a rain shadow. In the first step, enterprises were sorted by the cluster analysis into groups farming in the same area, at a similar altitude, with the same structure in a similar manner, and under comparable financial conditions – (such as debt ratio, liquidity and activity ratio. The results of this step showed a different method of farming within enterprises in disadvantaged areas. Such enterprises have created two distinct, separate clusters differing from the average in the use of fixed assets, technical equipment of labour, labour productivity and income structure. In the second step, the way how the return on assets of such enterprises is different from the average profitability of the enterprise was assessed. Testing differences in mean values of profitability was performed using the Student t–test. Due to the high variability of the Return on assets (ROA, the difference between standard and disadvantaged enterprises has not been proved.

  19. Spatial cluster analysis of nanoscopically mapped serotonin receptors for classification of fixed brain tissue

    Science.gov (United States)

    Sams, Michael; Silye, Rene; Göhring, Janett; Muresan, Leila; Schilcher, Kurt; Jacak, Jaroslaw

    2014-01-01

    We present a cluster spatial analysis method using nanoscopic dSTORM images to determine changes in protein cluster distributions within brain tissue. Such methods are suitable to investigate human brain tissue and will help to achieve a deeper understanding of brain disease along with aiding drug development. Human brain tissue samples are usually treated postmortem via standard fixation protocols, which are established in clinical laboratories. Therefore, our localization microscopy-based method was adapted to characterize protein density and protein cluster localization in samples fixed using different protocols followed by common fluorescent immunohistochemistry techniques. The localization microscopy allows nanoscopic mapping of serotonin 5-HT1A receptor groups within a two-dimensional image of a brain tissue slice. These nanoscopically mapped proteins can be confined to clusters by applying the proposed statistical spatial analysis. Selected features of such clusters were subsequently used to characterize and classify the tissue. Samples were obtained from different types of patients, fixed with different preparation methods, and finally stored in a human tissue bank. To verify the proposed method, samples of a cryopreserved healthy brain have been compared with epitope-retrieved and paraffin-fixed tissues. Furthermore, samples of healthy brain tissues were compared with data obtained from patients suffering from mental illnesses (e.g., major depressive disorder). Our work demonstrates the applicability of localization microscopy and image analysis methods for comparison and classification of human brain tissues at a nanoscopic level. Furthermore, the presented workflow marks a unique technological advance in the characterization of protein distributions in brain tissue sections.

  20. Relevant Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2009-01-01

    Subspace clustering aims at detecting clusters in any subspace projection of a high dimensional space. As the number of possible subspace projections is exponential in the number of dimensions, the result is often tremendously large. Recent approaches fail to reduce results to relevant subspace...... clusters. Their results are typically highly redundant, i.e. many clusters are detected multiple times in several projections. In this work, we propose a novel model for relevant subspace clustering (RESCU). We present a global optimization which detects the most interesting non-redundant subspace clusters...... achieves top clustering quality while competing approaches show greatly varying performance....

  1. Locating irregularly shaped clusters of infection intensity

    DEFF Research Database (Denmark)

    Yiannakoulias, Niko; Wilson, Shona; Kariuki, H. Curtis

    2010-01-01

    of infection intensity identifies two small areas within the study region in which infection intensity is elevated, possibly due to local features of the physical or social environment. Collectively, our results show that the "greedy growth scan" is a suitable method for exploratory geographical analysis...... for cluster detection. Real data are based on samples of hookworm and S. mansoni from Kitengei, Makueni district, Kenya. Our analysis of simulated data shows how methods able to find irregular shapes are more likely to identify clusters along rivers than methods constrained to fixed geometries. Our analysis...

  2. Global survey of star clusters in the Milky Way. VI. Age distribution and cluster formation history

    Science.gov (United States)

    Piskunov, A. E.; Just, A.; Kharchenko, N. V.; Berczik, P.; Scholz, R.-D.; Reffert, S.; Yen, S. X.

    2018-06-01

    Context. The all-sky Milky Way Star Clusters (MWSC) survey provides uniform and precise ages, along with other relevant parameters, for a wide variety of clusters in the extended solar neighbourhood. Aims: In this study we aim to construct the cluster age distribution, investigate its spatial variations, and discuss constraints on cluster formation scenarios of the Galactic disk during the last 5 Gyrs. Methods: Due to the spatial extent of the MWSC, we have considered spatial variations of the age distribution along galactocentric radius RG, and along Z-axis. For the analysis of the age distribution we used 2242 clusters, which all lie within roughly 2.5 kpc of the Sun. To connect the observed age distribution to the cluster formation history we built an analytical model based on simple assumptions on the cluster initial mass function and on the cluster mass-lifetime relation, fit it to the observations, and determined the parameters of the cluster formation law. Results: Comparison with the literature shows that earlier results strongly underestimated the number of evolved clusters with ages t ≳ 100 Myr. Recent studies based on all-sky catalogues agree better with our data, but still lack the oldest clusters with ages t ≳ 1 Gyr. We do not observe a strong variation in the age distribution along RG, though we find an enhanced fraction of older clusters (t > 1 Gyr) in the inner disk. In contrast, the distribution strongly varies along Z. The high altitude distribution practically does not contain clusters with t < 1 Gyr. With simple assumptions on the cluster formation history, the cluster initial mass function and the cluster lifetime we can reproduce the observations. The cluster formation rate and the cluster lifetime are strongly degenerate, which does not allow us to disentangle different formation scenarios. In all cases the cluster formation rate is strongly declining with time, and the cluster initial mass function is very shallow at the high mass end.

  3. Improved Ant Colony Clustering Algorithm and Its Performance Study

    Science.gov (United States)

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  4. Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes

    Science.gov (United States)

    Pell, Tony; Hargreaves, Linda

    2011-01-01

    Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…

  5. ENTREPRENEURIAL ACTIVITY IN ROMANIA – A TIME SERIES CLUSTERING ANALYSIS AT THE NUTS3 LEVEL

    Directory of Open Access Journals (Sweden)

    Sipos-Gug Sebastian

    2013-07-01

    Full Text Available Entrepreneurship is an active field of research, having known a major increase in interest and publication levels in the last years (Landström et al., 2012. Within this field recently there has been an increasing interest in understanding why some regions seem to have a significantly higher entrepreneurship activity compared to others. In line with this research field, we would like to investigate the differences in entrepreneurial activity among the Romanian counties (NUTS 3 regions. While the classical research paradigm in this field is to conduct a temporally stationary analysis, we choose to use a time series clustering analysis to better understanding the dynamics of entrepreneurial activity between counties. Our analysis showed that if we use the total number of new privately owned companies that are founded each year in the last decade (2002-2012 we can distinguish between 5 clusters, one with high total entrepreneurial activity (18 counties, one with above average activity (8 counties, two clusters with average and slightly below average activity (total of 18 counties and one cluster with low and declining activity (2 counties. If we are interested in the entrepreneurial activity rate, that is the number of new privately owned companies founded each year adjusted by the population of the respective county, we obtain 4 clusters, one with a very high entrepreneurial rate (1 county, one with average rate (10 counties, and two clusters with below average entrepreneurial rate (total of 31 counties. In conclusion, our research shows that Romania is far from being a homogeneous geographical area in respect to entrepreneurial activity. Depending on what we are interested in, it can be divided in 5 or 4 clusters of counties, which behave differently as a function of time. Further research should be focused on explaining these regional differences, on studying the high performance clusters and trying to improve the low performing ones.

  6. A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression.

    Science.gov (United States)

    Lu, Chi-Jie; Chang, Chi-Chang

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.

  7. Gene cluster statistics with gene families.

    Science.gov (United States)

    Raghupathy, Narayanan; Durand, Dannie

    2009-05-01

    Identifying genomic regions that descended from a common ancestor is important for understanding the function and evolution of genomes. In distantly related genomes, clusters of homologous gene pairs are evidence of candidate homologous regions. Demonstrating the statistical significance of such "gene clusters" is an essential component of comparative genomic analyses. However, currently there are no practical statistical tests for gene clusters that model the influence of the number of homologs in each gene family on cluster significance. In this work, we demonstrate empirically that failure to incorporate gene family size in gene cluster statistics results in overestimation of significance, leading to incorrect conclusions. We further present novel analytical methods for estimating gene cluster significance that take gene family size into account. Our methods do not require complete genome data and are suitable for testing individual clusters found in local regions, such as contigs in an unfinished assembly. We consider pairs of regions drawn from the same genome (paralogous clusters), as well as regions drawn from two different genomes (orthologous clusters). Determining cluster significance under general models of gene family size is computationally intractable. By assuming that all gene families are of equal size, we obtain analytical expressions that allow fast approximation of cluster probabilities. We evaluate the accuracy of this approximation by comparing the resulting gene cluster probabilities with cluster probabilities obtained by simulating a realistic, power-law distributed model of gene family size, with parameters inferred from genomic data. Surprisingly, despite the simplicity of the underlying assumption, our method accurately approximates the true cluster probabilities. It slightly overestimates these probabilities, yielding a conservative test. We present additional simulation results indicating the best choice of parameter values for data

  8. Validating clustering of molecular dynamics simulations using polymer models

    Directory of Open Access Journals (Sweden)

    Phillips Joshua L

    2011-11-01

    Full Text Available Abstract Background Molecular dynamics (MD simulation is a powerful technique for sampling the meta-stable and transitional conformations of proteins and other biomolecules. Computational data clustering has emerged as a useful, automated technique for extracting conformational states from MD simulation data. Despite extensive application, relatively little work has been done to determine if the clustering algorithms are actually extracting useful information. A primary goal of this paper therefore is to provide such an understanding through a detailed analysis of data clustering applied to a series of increasingly complex biopolymer models. Results We develop a novel series of models using basic polymer theory that have intuitive, clearly-defined dynamics and exhibit the essential properties that we are seeking to identify in MD simulations of real biomolecules. We then apply spectral clustering, an algorithm particularly well-suited for clustering polymer structures, to our models and MD simulations of several intrinsically disordered proteins. Clustering results for the polymer models provide clear evidence that the meta-stable and transitional conformations are detected by the algorithm. The results for the polymer models also help guide the analysis of the disordered protein simulations by comparing and contrasting the statistical properties of the extracted clusters. Conclusions We have developed a framework for validating the performance and utility of clustering algorithms for studying molecular biopolymer simulations that utilizes several analytic and dynamic polymer models which exhibit well-behaved dynamics including: meta-stable states, transition states, helical structures, and stochastic dynamics. We show that spectral clustering is robust to anomalies introduced by structural alignment and that different structural classes of intrinsically disordered proteins can be reliably discriminated from the clustering results. To our

  9. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    Directory of Open Access Journals (Sweden)

    Tariq Ahmad

    Full Text Available Classification of acute decompensated heart failure (ADHF is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies.To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside.We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry. We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data.We identified four advanced HF clusters: 1 male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP levels; 2 females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3 young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4 older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70. For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not.By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using

  10. Exploring syndrome differentiation using non-negative matrix factorization and cluster analysis in patients with atopic dermatitis.

    Science.gov (United States)

    Yun, Younghee; Jung, Wonmo; Kim, Hyunho; Jang, Bo-Hyoung; Kim, Min-Hee; Noh, Jiseong; Ko, Seong-Gyu; Choi, Inhwa

    2017-08-01

    Syndrome differentiation (SD) results in a diagnostic conclusion based on a cluster of concurrent symptoms and signs, including pulse form and tongue color. In Korea, there is a strong interest in the standardization of Traditional Medicine (TM). In order to standardize TM treatment, standardization of SD should be given priority. The aim of this study was to explore the SD, or symptom clusters, of patients with atopic dermatitis (AD) using non-negative factorization methods and k-means clustering analysis. We screened 80 patients and enrolled 73 eligible patients. One TM dermatologist evaluated the symptoms/signs using an existing clinical dataset from patients with AD. This dataset was designed to collect 15 dermatologic and 18 systemic symptoms/signs associated with AD. Non-negative matrix factorization was used to decompose the original data into a matrix with three features and a weight matrix. The point of intersection of the three coordinates from each patient was placed in three-dimensional space. With five clusters, the silhouette score reached 0.484, and this was the best silhouette score obtained from two to nine clusters. Patients were clustered according to the varying severity of concurrent symptoms/signs. Through the distribution of the null hypothesis generated by 10,000 permutation tests, we found significant cluster-specific symptoms/signs from the confidence intervals in the upper and lower 2.5% of the distribution. Patients in each cluster showed differences in symptoms/signs and severity. In a clinical situation, SD and treatment are based on the practitioners' observations and clinical experience. SD, identified through informatics, can contribute to development of standardized, objective, and consistent SD for each disease. Copyright © 2017. Published by Elsevier Ltd.

  11. Improving local clustering based top-L link prediction methods via asymmetric link clustering information

    Science.gov (United States)

    Wu, Zhihao; Lin, Youfang; Zhao, Yiji; Yan, Hongyan

    2018-02-01

    Networks can represent a wide range of complex systems, such as social, biological and technological systems. Link prediction is one of the most important problems in network analysis, and has attracted much research interest recently. Many link prediction methods have been proposed to solve this problem with various techniques. We can note that clustering information plays an important role in solving the link prediction problem. In previous literatures, we find node clustering coefficient appears frequently in many link prediction methods. However, node clustering coefficient is limited to describe the role of a common-neighbor in different local networks, because it cannot distinguish different clustering abilities of a node to different node pairs. In this paper, we shift our focus from nodes to links, and propose the concept of asymmetric link clustering (ALC) coefficient. Further, we improve three node clustering based link prediction methods via the concept of ALC. The experimental results demonstrate that ALC-based methods outperform node clustering based methods, especially achieving remarkable improvements on food web, hamster friendship and Internet networks. Besides, comparing with other methods, the performance of ALC-based methods are very stable in both globalized and personalized top-L link prediction tasks.

  12. Multichannel response analysis on 2D projection views for detection of clustered microcalcifications in digital breast tomosynthesis

    International Nuclear Information System (INIS)

    Wei, Jun; Chan, Heang-Ping; Hadjiiski, Lubomir M.; Helvie, Mark A.; Lu, Yao; Zhou, Chuan; Samala, Ravi

    2014-01-01

    Purpose: To investigate the feasibility of a new two-dimensional (2D) multichannel response (MCR) analysis approach for the detection of clustered microcalcifications (MCs) in digital breast tomosynthesis (DBT). Methods: With IRB approval and informed consent, a data set of two-view DBTs from 42 breasts containing biopsy-proven MC clusters was collected in this study. The authors developed a 2D approach for MC detection using projection view (PV) images rather than the reconstructed three-dimensional (3D) DBT volume. Signal-to-noise ratio (SNR) enhancement processing was first applied to each PV to enhance the potential MCs. The locations of MC candidates were then identified with iterative thresholding. The individual MCs were decomposed with Hermite–Gaussian (HG) and Laguerre–Gaussian (LG) basis functions and the channelized Hotelling model was trained to produce the MCRs for each MC on the 2D images. The MCRs from the PVs were fused in 3D by a coincidence counting method that backprojects the MC candidates on the PVs and traces the coincidence of their ray paths in 3D. The 3D MCR was used to differentiate the true MCs from false positives (FPs). Finally a dynamic clustering method was used to identify the potential MC clusters in the DBT volume based on the fact that true MCs of clinical significance appear in clusters. Using two-fold cross validation, the performance of the 3D MCR for classification of true and false MCs was estimated by the area under the receiver operating characteristic (ROC) curve and the overall performance of the MCR approach for detection of clustered MCs was assessed by free response receiver operating characteristic (FROC) analysis. Results: When the HG basis function was used for MCR analysis, the detection of MC cluster achieved case-based test sensitivities of 80% and 90% at the average FP rates of 0.65 and 1.55 FPs per DBT volume, respectively. With LG basis function, the average FP rates were 0.62 and 1.57 per DBT volume at

  13. PROSPECTS OF THE REGIONAL INTEGRATION POLICY BASED ON CLUSTER FORMATION

    Directory of Open Access Journals (Sweden)

    Elena Tsepilova

    2018-01-01

    Full Text Available The purpose of this article is to develop the theoretical foundations of regional integration policy and to determine its prospects on the basis of cluster formation. The authors use such research methods as systematization, comparative and complex analysis, synthesis, statistical method. Within the framework of the research, the concept of regional integration policy is specified, and its integration core – cluster – is allocated. The authors work out an algorithm of regional clustering, which will ensure the growth of economy and tax income. Measures have been proposed to optimize the organizational mechanism of interaction between the participants of the territorial cluster and the authorities that allow to ensure the effective functioning of clusters, including taxation clusters. Based on the results of studying the existing methods for assessing the effectiveness of cluster policy, the authors propose their own approach to evaluating the consequences of implementing the regional integration policy, according to which the list of quantitative and qualitative indicators is defined. The present article systematizes the experience and results of the cluster policy of certain European countries, that made it possible to determine the prospects and synergetic effect from the development of clusters as an integration foundation of regional policy in the Russian Federation. The authors carry out the analysis of activity of cluster formations using the example of the Rostov region – a leader in the formation of conditions for the cluster policy development in the Southern Federal District. 11 clusters and cluster initiatives are developing in this region. As a result, the authors propose measures for support of the already existing clusters and creation of the new ones.

  14. z ~ 7-10 Galaxies Behind Lensing Clusters: Contrast with Field Search Results

    Science.gov (United States)

    Bouwens, Rychard J.; Illingworth, Garth D.; Bradley, Larry D.; Ford, Holland; Franx, Marijn; Zheng, Wei; Broadhurst, Tom; Coe, Dan; Jee, M. James

    2009-01-01

    We conduct a search for z gsim 7 dropout galaxies behind 11 massive lensing clusters using 21 arcmin2 of deep Hubble Space Telescope NICMOS, ACS, and WFPC2 image data. In total, over this entire area, we find only one robust z ~ 7 z-dropout candidate (previously reported around Abell 1689). Four less robust z-dropout and J-dropout candidates are also found. The nature of the four weaker candidates could not be precisely determined due to the limited depth of the available optical data, but detailed simulations suggest that all four are likely to be low-redshift interlopers. By contrast, we estimate that our robust candidate A1689-zD1 has dropouts and 0.3 z ~ 9 J-dropouts over our cluster search area, in reasonable agreement with our observational results, given the small numbers. The number of z gsim 7 candidates we find in the present search is much lower than that which has been reported in several previous studies of the prevalence of z gsim 7 galaxies behind lensing clusters. To understand these differences, we examined z gsim 7 candidates in other studies and conclude that only a small fraction are likely to be z gsim 7 galaxies. Our findings support models that show that gravitational lensing from clusters is of the most value for detecting galaxies at magnitudes brighter than L* (H lsim 27) where the LF is expected to be very steep. Use of these clusters to constrain the faint-end slope or determine the full LF is likely of less value due to the shallower effective slope measured for the LF at fainter magnitudes, as well as significant uncertainties introduced from modeling both the gravitational lensing and incompleteness. Based on observations made with the NASA/ESA Hubble Space Telescope, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with programs #5352, 5935, 6488, 8249, 8882, 9289, 9452, 9717, 10150, 10154, 10200, 10325, 10504, 10863, 10996.

  15. Higgs Pair Production: Choosing Benchmarks With Cluster Analysis

    CERN Document Server

    Carvalho, Alexandra; Dorigo, Tommaso; Goertz, Florian; Gottardo, Carlo A.; Tosi, Mia

    2016-01-01

    New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...

  16. Family-based clusters of cognitive test performance in familial schizophrenia

    Directory of Open Access Journals (Sweden)

    Partonen Timo

    2004-07-01

    Full Text Available Abstract Background Cognitive traits derived from neuropsychological test data are considered to be potential endophenotypes of schizophrenia. Previously, these traits have been found to form a valid basis for clustering samples of schizophrenia patients into homogeneous subgroups. We set out to identify such clusters, but apart from previous studies, we included both schizophrenia patients and family members into the cluster analysis. The aim of the study was to detect family clusters with similar cognitive test performance. Methods Test scores from 54 randomly selected families comprising at least two siblings with schizophrenia spectrum disorders, and at least two unaffected family members were included in a complete-linkage cluster analysis with interactive data visualization. Results A well-performing, an impaired, and an intermediate family cluster emerged from the analysis. While the neuropsychological test scores differed significantly between the clusters, only minor differences were observed in the clinical variables. Conclusions The visually aided clustering algorithm was successful in identifying family clusters comprising both schizophrenia patients and their relatives. The present classification method may serve as a basis for selecting phenotypically more homogeneous groups of families in subsequent genetic analyses.

  17. Haplotyping Problem, A Clustering Approach

    International Nuclear Information System (INIS)

    Eslahchi, Changiz; Sadeghi, Mehdi; Pezeshk, Hamid; Kargar, Mehdi; Poormohammadi, Hadi

    2007-01-01

    Construction of two haplotypes from a set of Single Nucleotide Polymorphism (SNP) fragments is called haplotype reconstruction problem. One of the most popular computational model for this problem is Minimum Error Correction (MEC). Since MEC is an NP-hard problem, here we propose a novel heuristic algorithm based on clustering analysis in data mining for haplotype reconstruction problem. Based on hamming distance and similarity between two fragments, our iterative algorithm produces two clusters of fragments; then, in each iteration, the algorithm assigns a fragment to one of the clusters. Our results suggest that the algorithm has less reconstruction error rate in comparison with other algorithms

  18. Identification of Urban Leprosy Clusters

    Directory of Open Access Journals (Sweden)

    José Antonio Armani Paschoal

    2013-01-01

    Full Text Available Overpopulation of urban areas results from constant migrations that cause disordered urban growth, constituting clusters defined as sets of people or activities concentrated in relatively small physical spaces that often involve precarious conditions. Aim. Using residential grouping, the aim was to identify possible clusters of individuals in São José do Rio Preto, Sao Paulo, Brazil, who have or have had leprosy. Methods. A population-based, descriptive, ecological study using the MapInfo and CrimeStat techniques, geoprocessing, and space-time analysis evaluated the location of 425 people treated for leprosy between 1998 and 2010. Clusters were defined as concentrations of at least 8 people with leprosy; a distance of up to 300 meters between residences was adopted. Additionally, the year of starting treatment and the clinical forms of the disease were analyzed. Results. Ninety-eight (23.1% of 425 geocoded cases were located within one of ten clusters identified in this study, and 129 cases (30.3% were in the region of a second-order cluster, an area considered of high risk for the disease. Conclusion. This study identified ten clusters of leprosy cases in the city and identified an area of high risk for the appearance of new cases of the disease.

  19. Identification of Urban Leprosy Clusters

    Science.gov (United States)

    Paschoal, José Antonio Armani; Paschoal, Vania Del'Arco; Nardi, Susilene Maria Tonelli; Rosa, Patrícia Sammarco; Ismael, Manuela Gallo y Sanches; Sichieri, Eduvaldo Paulo

    2013-01-01

    Overpopulation of urban areas results from constant migrations that cause disordered urban growth, constituting clusters defined as sets of people or activities concentrated in relatively small physical spaces that often involve precarious conditions. Aim. Using residential grouping, the aim was to identify possible clusters of individuals in São José do Rio Preto, Sao Paulo, Brazil, who have or have had leprosy. Methods. A population-based, descriptive, ecological study using the MapInfo and CrimeStat techniques, geoprocessing, and space-time analysis evaluated the location of 425 people treated for leprosy between 1998 and 2010. Clusters were defined as concentrations of at least 8 people with leprosy; a distance of up to 300 meters between residences was adopted. Additionally, the year of starting treatment and the clinical forms of the disease were analyzed. Results. Ninety-eight (23.1%) of 425 geocoded cases were located within one of ten clusters identified in this study, and 129 cases (30.3%) were in the region of a second-order cluster, an area considered of high risk for the disease. Conclusion. This study identified ten clusters of leprosy cases in the city and identified an area of high risk for the appearance of new cases of the disease. PMID:24288467

  20. Analisis Prioritas Pembangunan Embung Metode Cluster Analysis, AHP dan Weighted Average (Studi Kasus: Embung di Kabupaten Semarang

    Directory of Open Access Journals (Sweden)

    Bima Anjasmoro

    2016-06-01

    Full Text Available The Feasibility study potential of small dams in Semarang District has identified 8 (eight urgent potential small dams. These potential dams here to be constructed within 5 (five years in order to overcome the problem of water shortage in the district. However, the government has limited funding source. It is necessary to select the more urgent small dams to be constructed within the limited budget. The purpose of the research is determining the priority of small dams construction in Semarang District. The method used to determine the priority in this study is cluster analysis, AHP and weighted average method. The criteria used to determine the priority in this study consist of: vegetation in the inundated area, volume of embankment, land acquisition area, useful storage, recervoir life time, water cost/m³, access road to the dam site, land status at abutment and inundated area, construction cost, operation and maintenance cost, irrigation service area and raw water benefit. Based on results of cluster analysis, AHP and weighted average method can be conclude that the priority of small dams construction is 1 Mluweh Small Dam (0.165, 2 Pakis Small Dam (0.142, 3 Lebak Small Dam (0.134, 4 Dadapayam Small Dam (0.128, 5 Gogodalem Small Dam (0.119, 6 Kandangan Small Dam (0.114, 7 Ngrawan Small Dam (0.102 and 8 Jatikurung Small Dam (0.096. Based on analysis of the order of priority of 3 (three method showed that method is more detail than cluster analysis method and weighted average method, because the result of AHP method is closer to the conditions of each dam in the field.

  1. Fuzzy cluster analysis on trace elements of Hangzhou Jiaotan Guan Porcelain

    International Nuclear Information System (INIS)

    Gao Zhengyao; Liu Youe; Chen Songhua

    1997-01-01

    Forty samples of South Song 'Jiaotan Guankiln' are analyzed by neutron activation analysis (NAA). The 36 trace element contents in every sample are determined. This trace elements are analyzed by fuzzy cluster method. The result shows that the source of glaze raw material of South Song Guan porcelain is clearly different from that of the body raw material. For Guan kiln of South Song dynasty there was a very stable and lasting source of raw material of glaze and body. The archaeological problems are clarified. The glaze material and body material of modern Guan porcelain are different from those of the ancient Guan Porcelain

  2. Suzaku observations of low surface brightness cluster Abell 1631

    Science.gov (United States)

    Babazaki, Yasunori; Mitsuishi, Ikuyuki; Ota, Naomi; Sasaki, Shin; Böhringer, Hans; Chon, Gayoung; Pratt, Gabriel W.; Matsumoto, Hironori

    2018-06-01

    We present analysis results for a nearby galaxy cluster Abell 1631 at z = 0.046 using the X-ray observatory Suzaku. This cluster is categorized as a low X-ray surface brightness cluster. To study the dynamical state of the cluster, we conduct four-pointed Suzaku observations and investigate physical properties of the Mpc-scale hot gas associated with the A 1631 cluster for the first time. Unlike relaxed clusters, the X-ray image shows no strong peak at the center and an irregular morphology. We perform spectral analysis and investigate the radial profiles of the gas temperature, density, and entropy out to approximately 1.5 Mpc in the east, north, west, and south directions by combining with the XMM-Newton data archive. The measured gas density in the central region is relatively low (a few ×10-4 cm-3) at the given temperature (˜2.9 keV) compared with X-ray-selected clusters. The entropy profile and value within the central region (r clusters. These features are also observed in another low surface brightness cluster, Abell 76. The spatial distributions of galaxies and the hot gas appear to be different. The X-ray luminosity is relatively lower than that expected from the velocity dispersion. A post-merger scenario may explain the observed results.

  3. First-principle study of silicon cluster doped with rhodium: Rh{sub 2}Si{sub n} (n = 1–11) clusters

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Shuai; Luo, Chang Geng; Li, Hua Yang [Department of Physics, Nanyang Normal University, Nanyang 473061 (China); Lu, Cheng, E-mail: lucheng@calypso.cn [Department of Physics, Nanyang Normal University, Nanyang 473061 (China); State Key Laboratory of Superhard Materials, Jilin University, Changchun 130012 (China); Li, Gen Quan; Lu, Zhi Wen [Department of Physics, Nanyang Normal University, Nanyang 473061 (China)

    2015-06-15

    The geometries, stabilities and electronic properties of rhodium-doped silicon clusters Rh{sub 2}Si{sub n} (n = 1–11) have been systematically studied by using density functional calculations at the B3LYP/GENECP level. The optimized results show that the lowest-energy isomers of Rh{sub 2}Si{sub n} clusters favor three-dimensional structures for n = 2–11. Based on the averaged binding energy, fragmentation energy, second-order energy difference and HOMO-LUMO energy gap, the stabilities of Rh{sub 2}Si{sub n} (n = 1–11) clusters have been analyzed. The calculated results suggest that the Rh{sub 2}Si{sub 6} cluster has the strongest relative stability and the doping with rhodium atoms can reduce the chemical stabilities of Si{sub n} clusters. The natural population and natural electron configuration analysis indicate that there is charge transfer from the Si atoms and 5s orbital of the Rh atoms to the 4d and 5p orbitals of Rh atoms. The analysis of electron localization function reveal that the Si–Si bonds are mainly covalent bonds and the Si–Rh bonds are almost ionic bonds. Moreover, the vertical ionization potential, vertical electron affinity, chemical hardness, chemical potential, vibrational spectrum and polarizability are also discussed. - Highlights: • The geometric structures of Rh{sub 2}Si{sub n} (n = 1–11) clusters are determined. • The stabilities and electronic properties of Rh{sub 2}Si{sub n} clusters are discussed. • The Rh{sub 2}Si{sub 6} cluster has the higher stability than other clusters. • The doped rhodium atoms can reduce the chemical stabilities of Si{sub n} clusters.

  4. Analyzing Dynamic Probabilistic Risk Assessment Data through Topology-Based Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Diego Mandelli; Dan Maljovec; BeiWang; Valerio Pascucci; Peer-Timo Bremer

    2013-09-01

    We investigate the use of a topology-based clustering technique on the data generated by dynamic event tree methodologies. The clustering technique we utilizes focuses on a domain-partitioning algorithm based on topological structures known as the Morse-Smale complex, which partitions the data points into clusters based on their uniform gradient flow behavior. We perform both end state analysis and transient analysis to classify the set of nuclear scenarios. We demonstrate our methodology on a dataset generated for a sodium-cooled fast reactor during an aircraft crash scenario. The simulation tracks the temperature of the reactor as well as the time for a recovery team to fix the passive cooling system. Combined with clustering results obtained previously through mean shift methodology, we present the user with complementary views of the data that help illuminate key features that may be otherwise hidden using a single methodology. By clustering the data, the number of relevant test cases to be selected for further analysis can be drastically reduced by selecting a representative from each cluster. Identifying the similarities of simulations within a cluster can also aid in the drawing of important conclusions with respect to safety analysis.

  5. Motivational Interviewing for Workers with Disabling Musculoskeletal Disorders: Results of a Cluster Randomized Control Trial.

    Science.gov (United States)

    Park, Joanne; Esmail, Shaniff; Rayani, Fahreen; Norris, Colleen M; Gross, Douglas P

    2018-06-01

    Purpose Although functional restoration programs appear effective in assisting injured workers to return-to-work (RTW) after a work related musculoskeletal (MSK) disorder, the addition of Motivational Interviewing (MI) to these programs may result in higher RTW. Methods We conducted a cluster randomized controlled trial with claimants attending an occupational rehabilitation facility from November 17, 2014 to June 30, 2015. Six clinicians provided MI in addition to the standard functional restoration program and formed an intervention group. Six clinicians continued to provide the standard functional restoration program based on graded activity, therapeutic exercise, and workplace accommodations. Independent t tests and chi square analysis were used to compare groups. Multivariable logistic regression was used to obtain the odds ratio of claimants' confirmed RTW status at time of program discharge. Results 728 workers' compensation claimants with MSK disorders were entered into 1 of 12 therapist clusters (MI group = 367, control group = 361). Claimants were predominantly employed (72.7%), males (63.2%), with moderate levels of pain and disability (mean pain VAS = 5.0/10 and mean Pain Disability Index = 48/70). Claimants were stratified based on job attachment status. The proportion of successful RTW at program discharge was 12.1% higher for unemployed workers in the intervention group (intervention group 21.6 vs. 9.5% in control, p = 0.03) and 3.0% higher for job attached workers compared to the control group (intervention group 97.1 vs. 94.1% in control, p = 0.10). Adherence to MI was mixed, but RTW was significantly higher among MI-adherent clinicians. The odds ratio for unemployed claimants was 2.64 (0.69-10.14) and 2.50 (0.68-9.14) for employed claimants after adjusting for age, sex, pain intensity, perceived disability, and therapist cluster. Conclusion MI in addition to routine functional restoration is more effective than routine

  6. Evaporation of Lennard-Jones clusters

    International Nuclear Information System (INIS)

    Roman, C.E.; Garzon, I.L.

    1991-01-01

    Extensive molecular dynamics simulations have been done to study the evaporation of a 13-atom Lennard-Jones cluster. The survival probability and the evaporative lifetime are calculated as a function of the cluster total energy from a classical trajectory analysis. The results are interpreted in terms of the RRK theory of unimolecular dissociation. The calculation of the binding energy of the evaporated species from the evaporation rate and the average kinetic energy release is discussed. (orig.)

  7. The high performance cluster computing system for BES offline data analysis

    International Nuclear Information System (INIS)

    Sun Yongzhao; Xu Dong; Zhang Shaoqiang; Yang Ting

    2004-01-01

    A high performance cluster computing system (EPCfarm) is introduced, which used for BES offline data analysis. The setup and the characteristics of the hardware and software of EPCfarm are described. The PBS, a queue management package, and the performance of EPCfarm is presented also. (authors)

  8. A Variable-Selection Heuristic for K-Means Clustering.

    Science.gov (United States)

    Brusco, Michael J.; Cradit, J. Dennis

    2001-01-01

    Presents a variable selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. Subjected the heuristic to Monte Carlo testing across more than 2,200 datasets. Results indicate that the heuristic is extremely effective at eliminating masking variables. (SLD)

  9. A markedness analysis of initial consonant clusters in Aphasic Phonological Impairment: A case study

    Directory of Open Access Journals (Sweden)

    Lesley Wolk

    1978-11-01

    Full Text Available The purpose of  this study was to assess both the theoretical and clinical value of  markedness theory in phonological impairment in aphasia. A markedness analysis was carried out on initial consonant clusters in a single aphasic adult, at two points during the spontaneous recovery phase. Results revealed systematic, rule-governed behaviour, reflecting  similar linguistic trends, in terms of  natural segments and natural processes, on both testing occasions. Some inadequacies of  the distinctive feature  approach are discussed. The findings  of  this study suggest that a markedness analysis may be extremely useful  for  the analysis and treatment of  phonological disorders in aphasia.

  10. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  11. Performance Analysis of Entropy Methods on K Means in Clustering Process

    Science.gov (United States)

    Dicky Syahputra Lubis, Mhd.; Mawengkang, Herman; Suwilo, Saib

    2017-12-01

    K Means is a non-hierarchical data clustering method that attempts to partition existing data into one or more clusters / groups. This method partitions the data into clusters / groups so that data that have the same characteristics are grouped into the same cluster and data that have different characteristics are grouped into other groups.The purpose of this data clustering is to minimize the objective function set in the clustering process, which generally attempts to minimize variation within a cluster and maximize the variation between clusters. However, the main disadvantage of this method is that the number k is often not known before. Furthermore, a randomly chosen starting point may cause two points to approach the distance to be determined as two centroids. Therefore, for the determination of the starting point in K Means used entropy method where this method is a method that can be used to determine a weight and take a decision from a set of alternatives. Entropy is able to investigate the harmony in discrimination among a multitude of data sets. Using Entropy criteria with the highest value variations will get the highest weight. Given this entropy method can help K Means work process in determining the starting point which is usually determined at random. Thus the process of clustering on K Means can be more quickly known by helping the entropy method where the iteration process is faster than the K Means Standard process. Where the postoperative patient dataset of the UCI Repository Machine Learning used and using only 12 data as an example of its calculations is obtained by entropy method only with 2 times iteration can get the desired end result.

  12. An improved K-means clustering algorithm in agricultural image segmentation

    Science.gov (United States)

    Cheng, Huifeng; Peng, Hui; Liu, Shanmei

    Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.

  13. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    Science.gov (United States)

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  14. Peeking Network States with Clustered Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Jinoh [Texas A & M Univ., Commerce, TX (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Sim, Alex [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-20

    Network traffic monitoring has long been a core element for effec- tive network management and security. However, it is still a chal- lenging task with a high degree of complexity for comprehensive analysis when considering multiple variables and ever-increasing traffic volumes to monitor. For example, one of the widely con- sidered approaches is to scrutinize probabilistic distributions, but it poses a scalability concern and multivariate analysis is not gen- erally supported due to the exponential increase of the complexity. In this work, we propose a novel method for network traffic moni- toring based on clustering, one of the powerful deep-learning tech- niques. We show that the new approach enables us to recognize clustered results as patterns representing the network states, which can then be utilized to evaluate “similarity” of network states over time. In addition, we define a new quantitative measure for the similarity between two compared network states observed in dif- ferent time windows, as a supportive means for intuitive analysis. Finally, we demonstrate the clustering-based network monitoring with public traffic traces, and show that the proposed approach us- ing the clustering method has a great opportunity for feasible, cost- effective network monitoring.

  15. Time-series clustering of gene expression in irradiated and bystander fibroblasts: an application of FBPA clustering

    Directory of Open Access Journals (Sweden)

    Markatou Marianthi

    2011-01-01

    Full Text Available Abstract Background The radiation bystander effect is an important component of the overall biological response of tissues and organisms to ionizing radiation, but the signaling mechanisms between irradiated and non-irradiated bystander cells are not fully understood. In this study, we measured a time-series of gene expression after α-particle irradiation and applied the Feature Based Partitioning around medoids Algorithm (FBPA, a new clustering method suitable for sparse time series, to identify signaling modules that act in concert in the response to direct irradiation and bystander signaling. We compared our results with those of an alternate clustering method, Short Time series Expression Miner (STEM. Results While computational evaluations of both clustering results were similar, FBPA provided more biological insight. After irradiation, gene clusters were enriched for signal transduction, cell cycle/cell death and inflammation/immunity processes; but only FBPA separated clusters by function. In bystanders, gene clusters were enriched for cell communication/motility, signal transduction and inflammation processes; but biological functions did not separate as clearly with either clustering method as they did in irradiated samples. Network analysis confirmed p53 and NF-κB transcription factor-regulated gene clusters in irradiated and bystander cells and suggested novel regulators, such as KDM5B/JARID1B (lysine (K-specific demethylase 5B and HDACs (histone deacetylases, which could epigenetically coordinate gene expression after irradiation. Conclusions In this study, we have shown that a new time series clustering method, FBPA, can provide new leads to the mechanisms regulating the dynamic cellular response to radiation. The findings implicate epigenetic control of gene expression in addition to transcription factor networks.

  16. The Feasibility of Using Cluster Analysis to Examine Log Data from Educational Video Games. CRESST Report 790

    Science.gov (United States)

    Kerr, Deirdre; Chung, Gregory K. W. K.; Iseli, Markus R.

    2011-01-01

    Analyzing log data from educational video games has proven to be a challenging endeavor. In this paper, we examine the feasibility of using cluster analysis to extract information from the log files that is interpretable in both the context of the game and the context of the subject area. If cluster analysis can be used to identify patterns of…

  17. A genome-wide analysis of nonribosomal peptide synthetase gene clusters and their peptides in a Planktothrix rubescens strain

    Directory of Open Access Journals (Sweden)

    Nederbragt Alexander J

    2009-08-01

    Full Text Available Abstract Background Cyanobacteria often produce several different oligopeptides, with unknown biological functions, by nonribosomal peptide synthetases (NRPS. Although some cyanobacterial NRPS gene cluster types are well described, the entire NRPS genomic content within a single cyanobacterial strain has never been investigated. Here we have combined a genome-wide analysis using massive parallel pyrosequencing ("454" and mass spectrometry screening of oligopeptides produced in the strain Planktothrix rubescens NIVA CYA 98 in order to identify all putative gene clusters for oligopeptides. Results Thirteen types of oligopeptides were uncovered by mass spectrometry (MS analyses. Microcystin, cyanopeptolin and aeruginosin synthetases, highly similar to already characterized NRPS, were present in the genome. Two novel NRPS gene clusters were associated with production of anabaenopeptins and microginins, respectively. Sequence-depth of the genome and real-time PCR data revealed three copies of the microginin gene cluster. Since NRPS gene cluster candidates for microviridin and oscillatorin synthesis could not be found, putative (gene encoded precursor peptide sequences to microviridin and oscillatorin were found in the genes mdnA and oscA, respectively. The genes flanking the microviridin and oscillatorin precursor genes encode putative modifying enzymes of the precursor oligopeptides. We therefore propose ribosomal pathways involving modifications and cyclisation for microviridin and oscillatorin. The microviridin, anabaenopeptin and cyanopeptolin gene clusters are situated in close proximity to each other, constituting an oligopeptide island. Conclusion Altogether seven nonribosomal peptide synthetase (NRPS gene clusters and two gene clusters putatively encoding ribosomal oligopeptide biosynthetic pathways were revealed. Our results demonstrate that whole genome shotgun sequencing combined with MS-directed determination of oligopeptides successfully

  18. A hierarchical clustering scheme approach to assessment of IP-network traffic using detrended fluctuation analysis

    Science.gov (United States)

    Takuma, Takehisa; Masugi, Masao

    2009-03-01

    This paper presents an approach to the assessment of IP-network traffic in terms of the time variation of self-similarity. To get a comprehensive view in analyzing the degree of long-range dependence (LRD) of IP-network traffic, we use a hierarchical clustering scheme, which provides a way to classify high-dimensional data with a tree-like structure. Also, in the LRD-based analysis, we employ detrended fluctuation analysis (DFA), which is applicable to the analysis of long-range power-law correlations or LRD in non-stationary time-series signals. Based on sequential measurements of IP-network traffic at two locations, this paper derives corresponding values for the LRD-related parameter α that reflects the degree of LRD of measured data. In performing the hierarchical clustering scheme, we use three parameters: the α value, average throughput, and the proportion of network traffic that exceeds 80% of network bandwidth for each measured data set. We visually confirm that the traffic data can be classified in accordance with the network traffic properties, resulting in that the combined depiction of the LRD and other factors can give us an effective assessment of network conditions at different times.

  19. Defining objective clusters for rabies virus sequences using affinity propagation clustering.

    Directory of Open Access Journals (Sweden)

    Susanne Fischer

    2018-01-01

    Full Text Available Rabies is caused by lyssaviruses, and is one of the oldest known zoonoses. In recent years, more than 21,000 nucleotide sequences of rabies viruses (RABV, from the prototype species rabies lyssavirus, have been deposited in public databases. Subsequent phylogenetic analyses in combination with metadata suggest geographic distributions of RABV. However, these analyses somewhat experience technical difficulties in defining verifiable criteria for cluster allocations in phylogenetic trees inviting for a more rational approach. Therefore, we applied a relatively new mathematical clustering algorythm named 'affinity propagation clustering' (AP to propose a standardized sub-species classification utilizing full-genome RABV sequences. Because AP has the advantage that it is computationally fast and works for any meaningful measure of similarity between data samples, it has previously been applied successfully in bioinformatics, for analysis of microarray and gene expression data, however, cluster analysis of sequences is still in its infancy. Existing (516 and original (46 full genome RABV sequences were used to demonstrate the application of AP for RABV clustering. On a global scale, AP proposed four clusters, i.e. New World cluster, Arctic/Arctic-like, Cosmopolitan, and Asian as previously assigned by phylogenetic studies. By combining AP with established phylogenetic analyses, it is possible to resolve phylogenetic relationships between verifiably determined clusters and sequences. This workflow will be useful in confirming cluster distributions in a uniform transparent manner, not only for RABV, but also for other comparative sequence analyses.

  20. Orbit Clustering Based on Transfer Cost

    Science.gov (United States)

    Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.

    2013-01-01

    We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.