WorldWideScience

Sample records for models cluster analysis

  1. Cluster analysis

    CERN Document Server

    Everitt, Brian S; Leese, Morven; Stahl, Daniel

    2011-01-01

    Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons

  2. Traffic Accident, System Model and Cluster Analysis in GIS

    Directory of Open Access Journals (Sweden)

    Veronika Vlčková

    2015-07-01

    Full Text Available One of the many often frequented topics as normal journalism, so the professional public, is the problem of traffic accidents. This article illustrates the orientation of considerations to a less known context of accidents, with the help of constructive systems theory and its methods, cluster analysis and geoinformation engineering. Traffic accident is reframing the space-time, and therefore it can be to study with tools of technology of geographic information systems. The application of system approach enabling the formulation of the system model, grabbed by tools of geoinformation engineering and multicriterial and cluster analysis.

  3. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  4. Year clustering analysis for modelling olive flowering phenology

    Science.gov (United States)

    Oteros, J.; García-Mozo, H.; Hervás-Martínez, C.; Galán, C.

    2013-07-01

    It is now widely accepted that weather conditions occurring several months prior to the onset of flowering have a major influence on various aspects of olive reproductive phenology, including flowering intensity. Given the variable characteristics of the Mediterranean climate, we analyse its influence on the registered variations in olive flowering intensity in southern Spain, and relate them to previous climatic parameters using a year-clustering approach, as a first step towards an olive flowering phenology model adapted to different year categories. Phenological data from Cordoba province (Southern Spain) for a 30-year period (1982-2011) were analysed. Meteorological and phenological data were first subjected to both hierarchical and "K-means" clustering analysis, which yielded four year-categories. For this classification purpose, three different models were tested: (1) discriminant analysis; (2) decision-tree analysis; and (3) neural network analysis. Comparison of the results showed that the neural-networks model was the most effective, classifying four different year categories with clearly distinct weather features. Flowering-intensity models were constructed for each year category using the partial least squares regression method. These category-specific models proved to be more effective than general models. They are better suited to the variability of the Mediterranean climate, due to the different response of plants to the same environmental stimuli depending on the previous weather conditions in any given year. The present detailed analysis of the influence of weather patterns of different years on olive phenology will help us to understand the short-term effects of climate change on olive crop in the Mediterranean area that is highly affected by it.

  5. Analysis of the dynamical cluster approximation for the Hubbard model

    OpenAIRE

    Aryanpour, K.; Hettler, M. H.; Jarrell, M.

    2002-01-01

    We examine a central approximation of the recently introduced Dynamical Cluster Approximation (DCA) by example of the Hubbard model. By both analytical and numerical means we study non-compact and compact contributions to the thermodynamic potential. We show that approximating non-compact diagrams by their cluster analogs results in a larger systematic error as compared to the compact diagrams. Consequently, only the compact contributions should be taken from the cluster, whereas non-compact ...

  6. The diamond model analysis of ICT cluster in Thailand

    Directory of Open Access Journals (Sweden)

    Danuvasin Charoen, Ph.D.

    2013-07-01

    Full Text Available Information and Communication Technology (ICT has become an integral part of national competitiveness. Thailand was ranked 38th (out of 134 countries in the global competitiveness report conducted by the World Economic Forum. It also was ranked well below the world average on all of the factors related to technology, despite the fact that information technology and telecommunications had been a major factor driving the competitiveness of the country. The main purpose of this study is to investigate the various issues related to ICT cluster in Thailand. The diamond model was used to analyze the ICT cluster in Thailand. The results from this study can be used to guide the policy to enhance the competitiveness of ICT cluster.

  7. Fuzzy subtractive clustering based prediction model for brand association analysis

    Directory of Open Access Journals (Sweden)

    Widodo Imam Djati

    2018-01-01

    Full Text Available The brand is one of the crucial elements that determine the success of a product. Consumers in determining the choice of a product will always consider product attributes (such as features, shape, and color, however consumers are also considering the brand. Brand will guide someone to associate a product with specific attributes and qualities. This study was designed to identify the product attributes and predict brand performance with those attributes. A survey was run to obtain the attributes affecting the brand. Subtractive Fuzzy Clustering was used to classify and predict product brand association based aspects of the product under investigation. The result indicates that the five attributes namely shape, ease, image, quality and price can be used to classify and predict the brand. Training step gives best FSC model with radii (ra = 0.1. It develops 70 clusters/rules with MSE (Training is 9.7093e-016. By using 14 data testing, the model can predict brand very well (close to the target with MSE is 0.6005 and its’ accuracy rate is 71%.

  8. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  9. A comparison of heuristic and model-based clustering methods for dietary pattern analysis.

    Science.gov (United States)

    Greve, Benjamin; Pigeot, Iris; Huybrechts, Inge; Pala, Valeria; Börnhorst, Claudia

    2016-02-01

    Cluster analysis is widely applied to identify dietary patterns. A new method based on Gaussian mixture models (GMM) seems to be more flexible compared with the commonly applied k-means and Ward's method. In the present paper, these clustering approaches are compared to find the most appropriate one for clustering dietary data. The clustering methods were applied to simulated data sets with different cluster structures to compare their performance knowing the true cluster membership of observations. Furthermore, the three methods were applied to FFQ data assessed in 1791 children participating in the IDEFICS (Identification and Prevention of Dietary- and Lifestyle-Induced Health Effects in Children and Infants) Study to explore their performance in practice. The GMM outperformed the other methods in the simulation study in 72 % up to 100 % of cases, depending on the simulated cluster structure. Comparing the computationally less complex k-means and Ward's methods, the performance of k-means was better in 64-100 % of cases. Applied to real data, all methods identified three similar dietary patterns which may be roughly characterized as a 'non-processed' cluster with a high consumption of fruits, vegetables and wholemeal bread, a 'balanced' cluster with only slight preferences of single foods and a 'junk food' cluster. The simulation study suggests that clustering via GMM should be preferred due to its higher flexibility regarding cluster volume, shape and orientation. The k-means seems to be a good alternative, being easier to use while giving similar results when applied to real data.

  10. Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis

    Science.gov (United States)

    2015-01-01

    discussion of its application to the network of network scientists. Each partitioning step in this spectral scheme either bipartitions or tripartitions a...University of California Los Angeles Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis A dissertation...00-00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis 5a

  11. Semiparametric Bayesian analysis of accelerated failure time models with cluster structures.

    Science.gov (United States)

    Li, Zhaonan; Xu, Xinyi; Shen, Junshan

    2017-11-10

    In this paper, we develop a Bayesian semiparametric accelerated failure time model for survival data with cluster structures. Our model allows distributional heterogeneity across clusters and accommodates their relationships through a density ratio approach. Moreover, a nonparametric mixture of Dirichlet processes prior is placed on the baseline distribution to yield full distributional flexibility. We illustrate through simulations that our model can greatly improve estimation accuracy by effectively pooling information from multiple clusters, while taking into account the heterogeneity in their random error distributions. We also demonstrate the implementation of our method using analysis of Mayo Clinic Trial in Primary Biliary Cirrhosis. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Cluster analysis in kinetic modelling of the brain: A noninvasive alternative to arterial sampling

    DEFF Research Database (Denmark)

    Liptrot, Matthew George; Adams, K.H.; Martiny, L.

    2004-01-01

    extracted from the PET data set. Hierarchical K-means cluster analysis was performed on the PET time series to extract a cerebral vasculature ROI. The number of clusters was varied from K = 1 to 10 for the second of the two-stage method. Determination of the correct number of clusters was performed...... blood sampling, the Simplified Reference Tissue Model (SRTM) and Logan analysis with cerebellar TAC as an input. There was a good agreement (P K-means-clustered input function and those from the arterial blood samples. This work......) extracted directly from dynamic positron emission tomography (PET) scans by cluster analysis. Five healthy subjects were injected with the 5HT2A- receptor ligand [18F]-altanserin and blood samples were subsequently taken from the radial artery and cubital vein. Eight regions-of-interest (ROI) TACs were...

  13. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis

    Directory of Open Access Journals (Sweden)

    Chao Zhang

    2017-09-01

    Full Text Available A wireless-powered sensor network (WPSN consisting of one hybrid access point (HAP, a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  14. Cluster Cooperation in Wireless-Powered Sensor Networks: Modeling and Performance Analysis.

    Science.gov (United States)

    Zhang, Chao; Zhang, Pengcheng; Zhang, Weizhan

    2017-09-27

    A wireless-powered sensor network (WPSN) consisting of one hybrid access point (HAP), a near cluster and the corresponding far cluster is investigated in this paper. These sensors are wireless-powered and they transmit information by consuming the harvested energy from signal ejected by the HAP. Sensors are able to harvest energy as well as store the harvested energy. We propose that if sensors in near cluster do not have their own information to transmit, acting as relays, they can help the sensors in a far cluster to forward information to the HAP in an amplify-and-forward (AF) manner. We use a finite Markov chain to model the dynamic variation process of the relay battery, and give a general analyzing model for WPSN with cluster cooperation. Though the model, we deduce the closed-form expression for the outage probability as the metric of this network. Finally, simulation results validate the start point of designing this paper and correctness of theoretical analysis and show how parameters have an effect on system performance. Moreover, it is also known that the outage probability of sensors in far cluster can be drastically reduced without sacrificing the performance of sensors in near cluster if the transmit power of HAP is fairly high. Furthermore, in the aspect of outage performance of far cluster, the proposed scheme significantly outperforms the direct transmission scheme without cooperation.

  15. 3D Building Models Segmentation Based on K-Means++ Cluster Analysis

    Science.gov (United States)

    Zhang, C.; Mao, B.

    2016-10-01

    3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model) 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid) and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  16. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  17. A Deep Learning Prediction Model Based on Extreme-Point Symmetric Mode Decomposition and Cluster Analysis

    OpenAIRE

    Li, Guohui; Zhang, Songling; Yang, Hong

    2017-01-01

    Aiming at the irregularity of nonlinear signal and its predicting difficulty, a deep learning prediction model based on extreme-point symmetric mode decomposition (ESMD) and clustering analysis is proposed. Firstly, the original data is decomposed by ESMD to obtain the finite number of intrinsic mode functions (IMFs) and residuals. Secondly, the fuzzy c-means is used to cluster the decomposed components, and then the deep belief network (DBN) is used to predict it. Finally, the reconstructed ...

  18. A Model-Based Cluster Analysis of Maternal Emotion Regulation and Relations to Parenting Behavior.

    Science.gov (United States)

    Shaffer, Anne; Whitehead, Monica; Davis, Molly; Morelen, Diana; Suveg, Cynthia

    2017-10-15

    In a diverse community sample of mothers (N = 108) and their preschool-aged children (M age  = 3.50 years), this study conducted person-oriented analyses of maternal emotion regulation (ER) based on a multimethod assessment incorporating physiological, observational, and self-report indicators. A model-based cluster analysis was applied to five indicators of maternal ER: maternal self-report, observed negative affect in a parent-child interaction, baseline respiratory sinus arrhythmia (RSA), and RSA suppression across two laboratory tasks. Model-based cluster analyses revealed four maternal ER profiles, including a group of mothers with average ER functioning, characterized by socioeconomic advantage and more positive parenting behavior. A dysregulated cluster demonstrated the greatest challenges with parenting and dyadic interactions. Two clusters of intermediate dysregulation were also identified. Implications for assessment and applications to parenting interventions are discussed. © 2017 Family Process Institute.

  19. Cluster analysis for applications

    CERN Document Server

    Anderberg, Michael R

    1973-01-01

    Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o

  20. A Deep Learning Prediction Model Based on Extreme-Point Symmetric Mode Decomposition and Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Guohui Li

    2017-01-01

    Full Text Available Aiming at the irregularity of nonlinear signal and its predicting difficulty, a deep learning prediction model based on extreme-point symmetric mode decomposition (ESMD and clustering analysis is proposed. Firstly, the original data is decomposed by ESMD to obtain the finite number of intrinsic mode functions (IMFs and residuals. Secondly, the fuzzy c-means is used to cluster the decomposed components, and then the deep belief network (DBN is used to predict it. Finally, the reconstructed IMFs and residuals are the final prediction results. Six kinds of prediction models are compared, which are DBN prediction model, EMD-DBN prediction model, EEMD-DBN prediction model, CEEMD-DBN prediction model, ESMD-DBN prediction model, and the proposed model in this paper. The same sunspots time series are predicted with six kinds of prediction models. The experimental results show that the proposed model has better prediction accuracy and smaller error.

  1. Marketing research cluster analysis

    Directory of Open Access Journals (Sweden)

    Marić Nebojša

    2002-01-01

    Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.

  2. Validation of an ANN Flow Prediction Model Using a Multt-Station Cluster Analysis

    NARCIS (Netherlands)

    Demirel, M.C.; Booij, Martijn J.; Kahya, E.

    2012-01-01

    The objective of this study is to validate a flow prediction model for a hydrometric station using a multistation criterion in addition to standard single-station performance criteria. In this contribution we used cluster analysis to identify the regional flow height, i.e., water-level patterns and

  3. Spatial cluster modelling

    CERN Document Server

    Lawson, Andrew B

    2002-01-01

    Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome research. In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal ...

  4. Degradation Assessment and Fault Diagnosis for Roller Bearing Based on AR Model and Fuzzy Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Lingli Jiang

    2011-01-01

    Full Text Available This paper proposes a new approach combining autoregressive (AR model and fuzzy cluster analysis for bearing fault diagnosis and degradation assessment. AR model is an effective approach to extract the fault feature, and is generally applied to stationary signals. However, the fault vibration signals of a roller bearing are non-stationary and non-Gaussian. Aiming at this problem, the set of parameters of the AR model is estimated based on higher-order cumulants. Consequently, the AR parameters are taken as the feature vectors, and fuzzy cluster analysis is applied to perform classification and pattern recognition. Experiments analysis results show that the proposed method can be used to identify various types and severities of fault bearings. This study is significant for non-stationary and non-Gaussian signal analysis, fault diagnosis and degradation assessment.

  5. A fully Bayesian latent variable model for integrative clustering analysis of multi-type omics data.

    Science.gov (United States)

    Mo, Qianxing; Shen, Ronglai; Guo, Cui; Vannucci, Marina; Chan, Keith S; Hilsenbeck, Susan G

    2018-01-01

    Identification of clinically relevant tumor subtypes and omics signatures is an important task in cancer translational research for precision medicine. Large-scale genomic profiling studies such as The Cancer Genome Atlas (TCGA) Research Network have generated vast amounts of genomic, transcriptomic, epigenomic, and proteomic data. While these studies have provided great resources for researchers to discover clinically relevant tumor subtypes and driver molecular alterations, there are few computationally efficient methods and tools for integrative clustering analysis of these multi-type omics data. Therefore, the aim of this article is to develop a fully Bayesian latent variable method (called iClusterBayes) that can jointly model omics data of continuous and discrete data types for identification of tumor subtypes and relevant omics features. Specifically, the proposed method uses a few latent variables to capture the inherent structure of multiple omics data sets to achieve joint dimension reduction. As a result, the tumor samples can be clustered in the latent variable space and relevant omics features that drive the sample clustering are identified through Bayesian variable selection. This method significantly improve on the existing integrative clustering method iClusterPlus in terms of statistical inference and computational speed. By analyzing TCGA and simulated data sets, we demonstrate the excellent performance of the proposed method in revealing clinically meaningful tumor subtypes and driver omics features. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. CLEAN: CLustering Enrichment ANalysis

    Science.gov (United States)

    Freudenberg, Johannes M; Joshi, Vineet K; Hu, Zhen; Medvedovic, Mario

    2009-01-01

    Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score). The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at . The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView). Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the conclusions made about clusters of co

  7. Microglia Morphological Categorization in a Rat Model of Neuroinflammation by Hierarchical Cluster and Principal Components Analysis.

    Science.gov (United States)

    Fernández-Arjona, María Del Mar; Grondona, Jesús M; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D

    2017-01-01

    It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable

  8. Modeling, Stability Analysis and Active Stabilization of Multiple DC-Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    . This model can be also used to synthesis and study dynamics of control loops in dc MGs and also dc MG clusters. An active stabilization method is proposed to be implemented as a dc active power filter (APF) inside the MGs in order to not only increase damping of dc MGs at the presence of CPLs but also......DC microgrids (MGs), as an alternative option, have attracted increasing interest in recent years due to many potential advantages as compare to the ac system. Stability of these systems can be an important issue under high penetration of load converters which behaves as constant power loads (CPLs......), and more especially during interconnection with other MGs, creating dc MG clusters. This paper develops a small signal model for dc MGs from the control point of view, in order to study stability analysis and investigate effects of CPLs and line impedances between the MGs on stability of these systems...

  9. Multilevel functional clustering analysis.

    Science.gov (United States)

    Serban, Nicoleta; Jiang, Huijing

    2012-09-01

    In this article, we investigate clustering methods for multilevel functional data, which consist of repeated random functions observed for a large number of units (e.g., genes) at multiple subunits (e.g., bacteria types). To describe the within- and between variability induced by the hierarchical structure in the data, we take a multilevel functional principal component analysis (MFPCA) approach. We develop and compare a hard clustering method applied to the scores derived from the MFPCA and a soft clustering method using an MFPCA decomposition. In a simulation study, we assess the estimation accuracy of the clustering membership and the cluster patterns under a series of settings: small versus moderate number of time points; various noise levels; and varying number of subunits per unit. We demonstrate the applicability of the clustering analysis to a real data set consisting of expression profiles from genes activated by immunity system cells. Prevalent response patterns are identified by clustering the expression profiles using our multilevel clustering analysis. © 2012, The International Biometric Society.

  10. Peer Cluster Theory and Adolescent Alcohol Use: An Explanation of Alcohol Use and Comparative Analysis between Two Causal Models.

    Science.gov (United States)

    Rose, Christopher D.

    1999-01-01

    Tests the premise of peer cluster theory as it applies to individual alcohol use, and makes a comparative analysis between its ability to explain alcohol use and marijuana use among college students (N=1312). Results of the causal models show some support for peer cluster theory. Discusses the study's limitations and implications. (Author/MKA)

  11. A Novel Clustering Model Based on Set Pair Analysis for the Energy Consumption Forecast in China

    Directory of Open Access Journals (Sweden)

    Mingwu Wang

    2014-01-01

    Full Text Available The energy consumption forecast is important for the decision-making of national economic and energy policies. But it is a complex and uncertainty system problem affected by the outer environment and various uncertainty factors. Herein, a novel clustering model based on set pair analysis (SPA was introduced to analyze and predict energy consumption. The annual dynamic relative indicator (DRI of historical energy consumption was adopted to conduct a cluster analysis with Fisher’s optimal partition method. Combined with indicator weights, group centroids of DRIs for influence factors were transferred into aggregating connection numbers in order to interpret uncertainty by identity-discrepancy-contrary (IDC analysis. Moreover, a forecasting model based on similarity to group centroid was discussed to forecast energy consumption of a certain year on the basis of measured values of influence factors. Finally, a case study predicting China’s future energy consumption as well as comparison with the grey method was conducted to confirm the reliability and validity of the model. The results indicate that the method presented here is more feasible and easier to use and can interpret certainty and uncertainty of development speed of energy consumption and influence factors as a whole.

  12. Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcomes: a simulation study

    Directory of Open Access Journals (Sweden)

    Ma Jinhui

    2013-01-01

    Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

  13. Social Learning Network Analysis Model to Identify Learning Patterns Using Ontology Clustering Techniques and Meaningful Learning

    Science.gov (United States)

    Firdausiah Mansur, Andi Besse; Yusof, Norazah

    2013-01-01

    Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…

  14. Effect of Policy Analysis on Indonesia’s Maritime Cluster Development Using System Dynamics Modeling

    Science.gov (United States)

    Nursyamsi, A.; Moeis, A. O.; Komarudin

    2018-03-01

    As an archipelago with two third of its territory consist of water, Indonesia should address more attention to its maritime industry development. One of the catalyst to fasten the maritime industry growth is by developing a maritime cluster. The purpose of this research is to gain understanding of the effect if Indonesia implement maritime cluster policy to the growth of maritime economic and its role to enhance the maritime cluster performance, hence enhancing Indonesia’s maritime industry as well. The result of the constructed system dynamic model simulation shows that with the effect of maritime cluster, the growth of employment rate and maritime economic is much bigger that the business as usual case exponentially. The result implies that the government should act fast to form a legitimate cluster maritime organizer institution so that there will be a synergize, sustainable, and positive maritime cluster environment that will benefit the performance of Indonesia’s maritime industry.

  15. Model-based document categorization employing semantic pattern analysis and local structure clustering

    Science.gov (United States)

    Fume, Kosei; Ishitani, Yasuto

    2008-01-01

    We propose a document categorization method based on a document model that can be defined externally for each task and that categorizes Web content or business documents into a target category in accordance with the similarity of the model. The main feature of the proposed method consists of two aspects of semantics extraction from an input document. The semantics of terms are extracted by the semantic pattern analysis and implicit meanings of document substructure are specified by a bottom-up text clustering technique focusing on the similarity of text line attributes. We have constructed a system based on the proposed method for trial purposes. The experimental results show that the system achieves more than 80% classification accuracy in categorizing Web content and business documents into 15 or 70 categories.

  16. Comprehensive cluster analysis with Transitivity Clustering.

    Science.gov (United States)

    Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan

    2011-03-01

    Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.

  17. Modeling Clustered Data with Very Few Clusters.

    Science.gov (United States)

    McNeish, Daniel; Stapleton, Laura M

    2016-01-01

    Small-sample inference with clustered data has received increased attention recently in the methodological literature, with several simulation studies being presented on the small-sample behavior of many methods. However, nearly all previous studies focus on a single class of methods (e.g., only multilevel models, only corrections to sandwich estimators), and the differential performance of various methods that can be implemented to accommodate clustered data with very few clusters is largely unknown, potentially due to the rigid disciplinary preferences. Furthermore, a majority of these studies focus on scenarios with 15 or more clusters and feature unrealistically simple data-generation models with very few predictors. This article, motivated by an applied educational psychology cluster randomized trial, presents a simulation study that simultaneously addresses the extreme small sample and differential performance (estimation bias, Type I error rates, and relative power) of 12 methods to account for clustered data with a model that features a more realistic number of predictors. The motivating data are then modeled with each method, and results are compared. Results show that generalized estimating equations perform poorly; the choice of Bayesian prior distributions affects performance; and fixed effect models perform quite well. Limitations and implications for applications are also discussed.

  18. A tri-stage cluster identification model for accurate analysis of seismic catalogs

    Directory of Open Access Journals (Sweden)

    S. J. Nanda

    2013-02-01

    Full Text Available In this paper we propose a tri-stage cluster identification model that is a combination of a simple single iteration distance algorithm and an iterative K-means algorithm. In this study of earthquake seismicity, the model considers event location, time and magnitude information from earthquake catalog data to efficiently classify events as either background or mainshock and aftershock sequences. Tests on a synthetic seismicity catalog demonstrate the efficiency of the proposed model in terms of accuracy percentage (94.81% for background and 89.46% for aftershocks. The close agreement between lambda and cumulative plots for the ideal synthetic catalog and that generated by the proposed model also supports the accuracy of the proposed technique. There is flexibility in the model design to allow for proper selection of location and magnitude ranges, depending upon the nature of the mainshocks present in the catalog. The effectiveness of the proposed model also is evaluated by the classification of events in three historic catalogs: California, Japan and Indonesia. As expected, for both synthetic and historic catalog analysis it is observed that the density of events classified as background is almost uniform throughout the region, whereas the density of aftershock events are higher near the mainshocks.

  19. GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models

    Directory of Open Access Journals (Sweden)

    Anders Ellern Bilgrau

    2016-04-01

    Full Text Available Methods for clustering in unsupervised learning are an important part of the statistical toolbox in numerous scientific disciplines. Tewari, Giering, and Raghunathan (2011 proposed to use so-called Gaussian mixture copula models (GMCM for general unsupervised learning based on clustering. Li, Brown, Huang, and Bickel (2011 independently discussed a special case of these GMCMs as a novel approach to meta-analysis in highdimensional settings. GMCMs have attractive properties which make them highly flexible and therefore interesting alternatives to other well-established methods. However, parameter estimation is hard because of intrinsic identifiability issues and intractable likelihood functions. Both aforementioned papers discuss similar expectation-maximization-like algorithms as their pseudo maximum likelihood estimation procedure. We present and discuss an improved implementation in R of both classes of GMCMs along with various alternative optimization routines to the EM algorithm. The software is freely available in the R package GMCM. The implementation is fast, general, and optimized for very large numbers of observations. We demonstrate the use of package GMCM through different applications.

  20. Assessment of anaesthetic depth by clustering analysis and autoregressive modelling of electroencephalograms

    DEFF Research Database (Denmark)

    Thomsen, C E; Rosenfalck, A; Nørregaard Christensen, K

    1991-01-01

    The brain activity electroencephalogram (EEG) was recorded from 30 healthy women scheduled for hysterectomy. The patients were anaesthetized with isoflurane, halothane or etomidate/fentanyl. A multiparametric method was used for extraction of amplitude and frequency information from the EEG....... The method applied autoregressive modelling of the signal, segmented in 2 s fixed intervals. The features from the EEG segments were used for learning and for classification. The learning process was unsupervised and hierarchical clustering analysis was used to construct a learning set of EEG amplitude......-frequency patterns for each of the three anaesthetic drugs. These EEG patterns were assigned to a colour code corresponding to similar clinical states. A common learning set could be used for all patients anaesthetized with the same drug. The classification process could be performed on-line and the results were...

  1. Cluster model of the nucleus

    International Nuclear Information System (INIS)

    Horiuchi, H.; Ikeda, K.

    1986-01-01

    This article reviews the development of the cluster model study. The stress is put on two points; one is how the cluster structure has come to be regarded as a fundamental structure in light nuclei together with the shell-model structure, and the other is how at present the cluster model is extended to and connected with the studies of the various subjects many of which are in the neighbouring fields. The authors the present the main theme with detailed explanations of the fundamentals of the microscopic cluster model which have promoted the development of the cluster mode. Examples of the microscopic cluster model study of light nuclear structure are given

  2. Potts Model with Invisible Colors : Random-Cluster Representation and Pirogov–Sinai Analysis

    NARCIS (Netherlands)

    Enter, Aernout C.D. van; Iacobelli, Giulio; Taati, Siamak

    We study a recently introduced variant of the ferromagnetic Potts model consisting of a ferromagnetic interaction among q “visible” colors along with the presence of r non-interacting “invisible” colors. We introduce a random-cluster representation for the model, for which we prove the existence of

  3. Fuzzy clustering: critical analysis of the contextual mechanisms employed by three neural network models

    Science.gov (United States)

    Baraldi, Andrea; Parmiggiani, Flavio

    1996-06-01

    According to the following definition, taken from the literature, a fuzzy clustering mechanism allows the same input pattern to belong to multiple categories to different degrees. Many clustering neural network (NN) models claim to feature fuzzy properties, but several of them (like the Fuzzy ART model) do not satisfy this definition. Vice versa, we believe that Kohonen's Self-Organizing Map, SOM, satisfies the definition provided above, even though this NN model is well-known to (robustly) perform topologically ordered mapping rather than fuzzy clustering. This may sound as a paradox if we consider that several fuzzy NN models (such as the Fuzzy Learning Vector Quantization, FLVQ, which was first called Fuzzy Kohonen Clustering Network, FKCN) were originally developed to enhance Kohonen's models (such as SOM and the vector quantization model, VQ). The fuzziness of SOM indicates that a network of processing elements (PEs) can verify the fuzzy clustering definition when it exploits local rules which are biologically plausible (such as the Kohonen bubble strategy). This is equivalent to state that the exploitation of the fuzzy set theory in the development of complex systems (e.g., clustering NNs) may provide new mathematical tools (e.g., the definition of membership function) to simulate the behavior of those cooperative/competitive mechanisms already identified by neurophysiological studies. When a biologically plausible cooperative/competitive strategy is pursued effectively, neighboring PEs become mutually coupled to gain sensitivity to contextual effects. PEs which are mutually coupled are affected by vertical (inter-layer) as well as horizontal (intra-layer) connections. To summarize, we suggest to relate the study of fuzzy clustering mechanisms to the multi-disciplinary science of complex systems, with special regard to the investigation of the cooperative/competitive local rules employed by complex systems to gain sensitivity to contextual effects in

  4. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases......, the classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....

  5. Analyzing Patients' Values by Applying Cluster Analysis and LRFM Model in a Pediatric Dental Clinic in Taiwan

    Science.gov (United States)

    Lin, Shih-Yen; Liu, Chih-Wei

    2014-01-01

    This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741

  6. Relation chain based clustering analysis

    Science.gov (United States)

    Zhang, Cheng-ning; Zhao, Ming-yang; Luo, Hai-bo

    2011-08-01

    Clustering analysis is currently one of well-developed branches in data mining technology which is supposed to find the hidden structures in the multidimensional space called feature or pattern space. A datum in the space usually possesses a vector form and the elements in the vector represent several specifically selected features. These features are often of efficiency to the problem oriented. Generally, clustering analysis goes into two divisions: one is based on the agglomerative clustering method, and the other one is based on divisive clustering method. The former refers to a bottom-up process which regards each datum as a singleton cluster while the latter refers to a top-down process which regards entire data as a cluster. As the collected literatures, it is noted that the divisive clustering is currently overwhelming both in application and research. Although some famous divisive clustering methods are designed and well developed, clustering problems are still far from being solved. The k - means algorithm is the original divisive clustering method which initially assigns some important index values, such as the clustering number and the initial clustering prototype positions, and that could not be reasonable in some certain occasions. More than the initial problem, the k - means algorithm may also falls into local optimum, clusters in a rigid way and is not available for non-Gaussian distribution. One can see that seeking for a good or natural clustering result, in fact, originates from the one's understanding of the concept of clustering. Thus, the confusion or misunderstanding of the definition of clustering always derives some unsatisfied clustering results. One should consider the definition deeply and seriously. This paper demonstrates the nature of clustering, gives the way of understanding clustering, discusses the methodology of designing a clustering algorithm, and proposes a new clustering method based on relation chains among 2D patterns. In

  7. An Efficient Data Compression Model Based on Spatial Clustering and Principal Component Analysis in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yihang Yin

    2015-08-01

    Full Text Available Wireless sensor networks (WSNs have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA. First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.

  8. Evaluating Mixture Modeling for Clustering: Recommendations and Cautions

    Science.gov (United States)

    Steinley, Douglas; Brusco, Michael J.

    2011-01-01

    This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…

  9. Microscopic cluster model analysis of 14O+p elastic scattering

    International Nuclear Information System (INIS)

    Baye, D.; Descouvemont, P.; Leo, F.

    2005-01-01

    The 14 O+p elastic scattering is discussed in detail in a fully microscopic cluster model. The 14 O cluster is described by a closed p shell for protons and a closed p3/2 subshell for neutrons in the translation-invariant harmonic-oscillator model. The exchange and spin-orbit parameters of the effective forces are tuned on the energy levels of the 15 C mirror system. With the generator-coordinate and microscopic R-matrix methods, phase shifts and cross sections are calculated for the 14 O+p elastic scattering. An excellent agreement is found with recent experimental data. A comparison is performed with phenomenological R-matrix fits. Resonances properties in 15 F are discussed

  10. A Dedicated Mixture Model for Clustering Smart Meter Data: Identification and Analysis of Electricity Consumption Behaviors

    Directory of Open Access Journals (Sweden)

    Fateh Nassim Melzi

    2017-09-01

    Full Text Available The large amount of data collected by smart meters is a valuable resource that can be used to better understand consumer behavior and optimize electricity consumption in cities. This paper presents an unsupervised classification approach for extracting typical consumption patterns from data generated by smart electric meters. The proposed approach is based on a constrained Gaussian mixture model whose parameters vary according to the day type (weekday, Saturday or Sunday. The proposed methodology is applied to a real dataset of Irish households collected by smart meters over one year. For each cluster, the model provides three consumption profiles that depend on the day type. In the first instance, the model is applied on the electricity consumption of users during one month to extract groups of consumers who exhibit similar consumption behaviors. The clustering results are then crossed with contextual variables available for the households to show the close links between electricity consumption and household socio-economic characteristics. At the second instance, the evolution of the consumer behavior from one month to another is assessed through variations of cluster sizes over time. The results show that the consumer behavior evolves over time depending on the contextual variables such as temperature fluctuations and calendar events.

  11. Remodularization Analysis Using Semantic Clustering

    OpenAIRE

    Santos, Gustavo; Tulio Valente, Marco; Anquetil, Nicolas

    2014-01-01

    International audience; In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remodularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using six real-world remodularizations of four software systems. We report th...

  12. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  13. Gaussian-mixture-model-based cluster analysis finds five kinds of gamma-ray bursts in the BATSE catalogue

    Science.gov (United States)

    Chattopadhyay, Souradeep; Maitra, Ranjan

    2017-08-01

    Clustering methods are an important tool to enumerate and describe the different coherent kind of gamma-ray bursts (GRBs). But their performance can be affected by a number of factors such as the choice of clustering algorithm and inherent associated assumptions, the inclusion of variables in clustering, nature of initialization methods used or the iterative algorithm or the criterion used to judge the optimal number of groups supported by the data. We analysed GRBs from the Burst and Transient Source Experiment (BATSE) 4Br Catalog using k-means and Gaussian-mixture-models-based clustering methods and found that after accounting for all the above factors, all six variables - different subsets of which have been used in the literature - that are, namely, the flux duration variables (T50, T90), the peak flux (P256) measured in 256 ms bins, the total fluence (Ft) and the spectral hardness ratios (H32 and H321) contain information on clustering. Further, our analysis found evidence of five different kinds of GRBs and that these groups have different kinds of dispersions in terms of shape, size and orientation. In terms of duration, fluence and spectrum, the five types of GRBs were characterized as intermediate/faint/intermediate, long/intermediate/soft, intermediate/intermediate/intermediate, short/faint/hard and long/bright/intermediate.

  14. Integrative cluster analysis in bioinformatics

    CERN Document Server

    Abu-Jamous, Basel; Nandi, Asoke K

    2015-01-01

    Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o

  15. Statistical analysis of two-dimensional cluster structures composed of ferromagnetic particles based on a flexible chain model.

    Science.gov (United States)

    Morimoto, Hisao; Maekawa, Toru; Matsumoto, Yoichiro

    2003-12-01

    We investigate two-dimensional cluster structures composed of ferromagnetic colloidal particles, based on a flexible chain model, by the configurational-bias Monte Carlo method. We clarify the dependence of the probabilities of the creation of different types of clusters on the dipole-dipole interactive energy and the cluster size.

  16. Cluster Correlation in Mixed Models

    Science.gov (United States)

    Gardini, A.; Bonometto, S. A.; Murante, G.; Yepes, G.

    2000-10-01

    We evaluate the dependence of the cluster correlation length, rc, on the mean intercluster separation, Dc, for three models with critical matter density, vanishing vacuum energy (Λ=0), and COBE normalization: a tilted cold dark matter (tCDM) model (n=0.8) and two blue mixed models with two light massive neutrinos, yielding Ωh=0.26 and 0.14 (MDM1 and MDM2, respectively). All models approach the observational value of σ8 (and hence the observed cluster abundance) and are consistent with the observed abundance of damped Lyα systems. Mixed models have a motivation in recent results of neutrino physics; they also agree with the observed value of the ratio σ8/σ25, yielding the spectral slope parameter Γ, and nicely fit Las Campanas Redshift Survey (LCRS) reconstructed spectra. We use parallel AP3M simulations, performed in a wide box (of side 360 h-1 Mpc) and with high mass and distance resolution, enabling us to build artificial samples of clusters, whose total number and mass range allow us to cover the same Dc interval inspected through Automatic Plate Measuring Facility (APM) and Abell cluster clustering data. We find that the tCDM model performs substantially better than n=1 critical density CDM models. Our main finding, however, is that mixed models provide a surprisingly good fit to cluster clustering data.

  17. Two different approaches to the affective profiles model: median splits (variable-oriented) and cluster analysis (person-oriented)

    Science.gov (United States)

    MacDonald, Shane; Archer, Trevor

    2015-01-01

    Background. The notion of the affective system as being composed of two dimensions led Archer and colleagues to the development of the affective profiles model. The model consists of four different profiles based on combinations of individuals’ experience of high/low positive and negative affect: self-fulfilling, low affective, high affective, and self-destructive. During the past 10 years, an increasing number of studies have used this person-centered model as the backdrop for the investigation of between and within individual differences in ill-being and well-being. The most common approach to this profiling is by dividing individuals’ scores of self-reported affect using the median of the population as reference for high/low splits. However, scores just-above and just-below the median might become high and low by arbitrariness, not by reality. Thus, it is plausible to criticize the validity of this variable-oriented approach. Our aim was to compare the median splits approach with a person-oriented approach, namely, cluster analysis. Method. The participants (N = 2, 225) were recruited through Amazons’ Mechanical Turk and asked to self-report affect using the Positive Affect Negative Affect Schedule. We compared the profiles’ homogeneity and Silhouette coefficients to discern differences in homogeneity and heterogeneity between approaches. We also conducted exact cell-wise analyses matching the profiles from both approaches and matching profiles and gender to investigate profiling agreement with respect to affectivity levels and affectivity and gender. All analyses were conducted using the ROPstat software. Results. The cluster approach (weighted average of cluster homogeneity coefficients = 0.62, Silhouette coefficients = 0.68) generated profiles with greater homogeneity and more distinctive from each other compared to the median splits approach (weighted average of cluster homogeneity coefficients = 0.75, Silhouette coefficients = 0.59). Most of the

  18. Cluster-specific small airway modeling for imaging-based CFD analysis of pulmonary air flow and particle deposition in COPD smokers

    Science.gov (United States)

    Haghighi, Babak; Choi, Jiwoong; Choi, Sanghun; Hoffman, Eric A.; Lin, Ching-Long

    2017-11-01

    Accurate modeling of small airway diameters in patients with chronic obstructive pulmonary disease (COPD) is a crucial step toward patient-specific CFD simulations of regional airflow and particle transport. We proposed to use computed tomography (CT) imaging-based cluster membership to identify structural characteristics of airways in each cluster and use them to develop cluster-specific airway diameter models. We analyzed 284 COPD smokers with airflow limitation, and 69 healthy controls. We used multiscale imaging-based cluster analysis (MICA) to classify smokers into 4 clusters. With representative cluster patients and healthy controls, we performed multiple regressions to quantify variation of airway diameters by generation as well as by cluster. The cluster 2 and 4 showed more diameter decrease as generation increases than other clusters. The cluster 4 had more rapid decreases of airway diameters in the upper lobes, while cluster 2 in the lower lobes. We then used these regression models to estimate airway diameters in CT unresolved regions to obtain pressure-volume hysteresis curves using a 1D resistance model. These 1D flow solutions can be used to provide the patient-specific boundary conditions for 3D CFD simulations in COPD patients. Support for this study was provided, in part, by NIH Grants U01-HL114494, R01-HL112986 and S10-RR022421.

  19. Linear regression models and k-means clustering for statistical analysis of fNIRS data.

    Science.gov (United States)

    Bonomini, Viola; Zucchelli, Lucia; Re, Rebecca; Ieva, Francesca; Spinelli, Lorenzo; Contini, Davide; Paganoni, Anna; Torricelli, Alessandro

    2015-02-01

    We propose a new algorithm, based on a linear regression model, to statistically estimate the hemodynamic activations in fNIRS data sets. The main concern guiding the algorithm development was the minimization of assumptions and approximations made on the data set for the application of statistical tests. Further, we propose a K-means method to cluster fNIRS data (i.e. channels) as activated or not activated. The methods were validated both on simulated and in vivo fNIRS data. A time domain (TD) fNIRS technique was preferred because of its high performances in discriminating cortical activation and superficial physiological changes. However, the proposed method is also applicable to continuous wave or frequency domain fNIRS data sets.

  20. Power Quality Analysis Using a Hybrid Model of the Fuzzy Min-Max Neural Network and Clustering Tree.

    Science.gov (United States)

    Seera, Manjeevan; Lim, Chee Peng; Loo, Chu Kiong; Singh, Harapajan

    2016-12-01

    A hybrid intelligent model comprising a modified fuzzy min-max (FMM) clustering neural network and a modified clustering tree (CT) is developed. A review of clustering models with rule extraction capabilities is presented. The hybrid FMM-CT model is explained. We first use several benchmark problems to illustrate the cluster evolution patterns from the proposed modifications in FMM. Then, we employ a case study with real data related to power quality monitoring to assess the usefulness of FMM-CT. The results are compared with those from other clustering models. More importantly, we extract explanatory rules from FMM-CT to justify its predictions. The empirical findings indicate the usefulness of the proposed model in tackling data clustering and power quality monitoring problems under different environments.

  1. A tripartite clustering analysis on microRNA, gene and disease model.

    Science.gov (United States)

    Shen, Chengcheng; Liu, Ying

    2012-02-01

    Alteration of gene expression in response to regulatory molecules or mutations could lead to different diseases. MicroRNAs (miRNAs) have been discovered to be involved in regulation of gene expression and a wide variety of diseases. In a tripartite biological network of human miRNAs, their predicted target genes and the diseases caused by altered expressions of these genes, valuable knowledge about the pathogenicity of miRNAs, involved genes and related disease classes can be revealed by co-clustering miRNAs, target genes and diseases simultaneously. Tripartite co-clustering can lead to more informative results than traditional co-clustering with only two kinds of members and pass the hidden relational information along the relation chain by considering multi-type members. Here we report a spectral co-clustering algorithm for k-partite graph to find clusters with heterogeneous members. We use the method to explore the potential relationships among miRNAs, genes and diseases. The clusters obtained from the algorithm have significantly higher density than randomly selected clusters, which means members in the same cluster are more likely to have common connections. Results also show that miRNAs in the same family based on the hairpin sequences tend to belong to the same cluster. We also validate the clustering results by checking the correlation of enriched gene functions and disease classes in the same cluster. Finally, widely studied miR-17-92 and its paralogs are analyzed as a case study to reveal that genes and diseases co-clustered with the miRNAs are in accordance with current research findings.

  2. Cluster-based upper body marker models for three-dimensional kinematic analysis: Comparison with an anatomical model and reliability analysis.

    Science.gov (United States)

    Boser, Quinn A; Valevicius, Aïda M; Lavoie, Ewen B; Chapman, Craig S; Pilarski, Patrick M; Hebert, Jacqueline S; Vette, Albert H

    2018-02-27

    Quantifying angular joint kinematics of the upper body is a useful method for assessing upper limb function. Joint angles are commonly obtained via motion capture, tracking markers placed on anatomical landmarks. This method is associated with limitations including administrative burden, soft tissue artifacts, and intra- and inter-tester variability. An alternative method involves the tracking of rigid marker clusters affixed to body segments, calibrated relative to anatomical landmarks or known joint angles. The accuracy and reliability of applying this cluster method to the upper body has, however, not been comprehensively explored. Our objective was to compare three different upper body cluster models with an anatomical model, with respect to joint angles and reliability. Non-disabled participants performed two standardized functional upper limb tasks with anatomical and cluster markers applied concurrently. Joint angle curves obtained via the marker clusters with three different calibration methods were compared to those from an anatomical model, and between-session reliability was assessed for all models. The cluster models produced joint angle curves which were comparable to and highly correlated with those from the anatomical model, but exhibited notable offsets and differences in sensitivity for some degrees of freedom. Between-session reliability was comparable between all models, and good for most degrees of freedom. Overall, the cluster models produced reliable joint angles that, however, cannot be used interchangeably with anatomical model outputs to calculate kinematic metrics. Cluster models appear to be an adequate, and possibly advantageous alternative to anatomical models when the objective is to assess trends in movement behavior. Copyright © 2018 Elsevier Ltd. All rights reserved.

  3. Single-cluster dynamics for the random-cluster model

    NARCIS (Netherlands)

    Deng, Y.; Qian, X.; Blöte, H.W.J.

    2009-01-01

    We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q-state Potts model to noninteger values q>1. Its results for static quantities are in a satisfactory agreement with those

  4. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  5. Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

    NARCIS (Netherlands)

    Ranciati, Saverio; Viroli, Cinzia; Wit, Ernst C.

    2017-01-01

    Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next-generation sequencing (NGS) experiments, for example, the

  6. JOINT ANALYSIS OF X-RAY AND SUNYAEV-ZEL'DOVICH OBSERVATIONS OF GALAXY CLUSTERS USING AN ANALYTIC MODEL OF THE INTRACLUSTER MEDIUM

    International Nuclear Information System (INIS)

    Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Landry, David; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Hennessy, Ryan; Leitch, Erik M.; Mantz, Adam; Marrone, Daniel P.; Plagge, Thomas; Hawkins, David; Lamb, James W.; Muchovej, Stephen; Joy, Marshall; Kolodziejczak, Jeffery; Miller, Amber; Mroczkowski, Tony

    2012-01-01

    We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.

  7. A novel model for Time-Series Data Clustering Based on piecewise SVD and BIRCH for Stock Data Analysis on Hadoop Platform

    Directory of Open Access Journals (Sweden)

    Ibgtc Bowala

    2017-06-01

    Full Text Available With the rapid growth of financial markets, analyzers are paying more attention on predictions. Stock data are time series data, with huge amounts. Feasible solution for handling the increasing amount of data is to use a cluster for parallel processing, and Hadoop parallel computing platform is a typical representative. There are various statistical models for forecasting time series data, but accurate clusters are a pre-requirement. Clustering analysis for time series data is one of the main methods for mining time series data for many other analysis processes. However, general clustering algorithms cannot perform clustering for time series data because series data has a special structure and a high dimensionality has highly co-related values due to high noise level. A novel model for time series clustering is presented using BIRCH, based on piecewise SVD, leading to a novel dimension reduction approach. Highly co-related features are handled using SVD with a novel approach for dimensionality reduction in order to keep co-related behavior optimal and then use BIRCH for clustering. The algorithm is a novel model that can handle massive time series data. Finally, this new model is successfully applied to real stock time series data of Yahoo finance with satisfactory results.

  8. A Hybrid Model for Forecasting Groundwater Levels Based on Fuzzy C-Mean Clustering and Singular Spectrum Analysis

    Directory of Open Access Journals (Sweden)

    Dušan Polomčić

    2017-07-01

    Full Text Available Having the ability to forecast groundwater levels is very significant because of their vital role in basic functions related to efficiency and the sustainability of water supplies. The uncertainty which dominates our understanding of the functioning of water supply systems is of great significance and arises as a consequence of the time-unbalanced water consumption rate and the deterioration of the recharge conditions of captured aquifers. The aim of this paper is to present a hybrid model based on fuzzy C-mean clustering and singular spectrum analysis to forecast the weekly values of the groundwater level of a groundwater source. This hybrid model demonstrates how the fuzzy C-mean can be used to transform the sequence of the observed data into a sequence of fuzzy states, serving as a basis for the forecasting of future states by singular spectrum analysis. In this way, the forecasting efficiency is improved, because we predict the interval rather than the crisp value where the level will be. It gives much more flexibility to the engineers when managing and planning sustainable water supplies. A model is tested by using the observed weekly time series of the groundwater source, located near the town of Čačak in south-western Serbia.

  9. Cluster models and other topics

    CERN Document Server

    Akaishi, Yoshinori; Horiuchi, Hisashi; Ikeda, Kiyomi

    1986-01-01

    This volume consists of contributions from some of Japan's most eminent nuclear theorists. The cluster model of the nucleus is discussed pedagogically and the current status of the field is surveyed. A contribution on Monte Carlo Methods and Lattice Gauge Theories gives nuclear theorists a glimpse of related developments in QCD and Gauge Theories. Few Body Systems are reviewed by Y Akaishi, paying special attention to the ATMS Multiple Scattering Method.

  10. Clustering metagenomic sequences with interpolated Markov models

    Directory of Open Access Journals (Sweden)

    Kelley David R

    2010-11-01

    Full Text Available Abstract Background Sequencing of environmental DNA (often called metagenomics has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models, an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm.

  11. Localized versus shell-model-like clusters

    Energy Technology Data Exchange (ETDEWEB)

    Cseh, J.; Algora, A. [Institute of Nuclear Research of the Hungarian Academy of Sciences, Debrecen, Pf. 51, 4001 Hungary (Hungary); Darai, J. [Institute of Experimental Physics, University of Debrecen, Debrecen, Bem ter 18/A, 4026 Hungary (Hungary); Yepez M, H. [Universidad Autonoma de la Ciudad de Mexico, Prolongacion San Isidro 151, Col. San Lorenzo Tezonco, 09790 Mexico D. F. (Mexico); Hess, P. O. [Instituto de Ciencias Nucleares, UNAM, Apartado Postal 70-543, 04510 Mexico D. F. (Mexico)]. e-mail: cseh@atomki.hu

    2008-12-15

    In light of the relation of the shell model and the cluster model, the concepts of localized and shell-model-like clusters are discussed. They are interpreted as different phases of clusterization, which may be characterized by quasi-dynamical symmetries, and are connected by a phase-transition. (Author)

  12. Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data.

    Science.gov (United States)

    Yu, Zhiwen; Chen, Hantao; You, Jane; Liu, Jiming; Wong, Hau-San; Han, Guoqiang; Li, Le

    2015-01-01

    Performing clustering analysis is one of the important research topics in cancer discovery using gene expression profiles, which is crucial in facilitating the successful diagnosis and treatment of cancer. While there are quite a number of research works which perform tumor clustering, few of them considers how to incorporate fuzzy theory together with an optimization process into a consensus clustering framework to improve the performance of clustering analysis. In this paper, we first propose a random double clustering based cluster ensemble framework (RDCCE) to perform tumor clustering based on gene expression data. Specifically, RDCCE generates a set of representative features using a randomly selected clustering algorithm in the ensemble, and then assigns samples to their corresponding clusters based on the grouping results. In addition, we also introduce the random double clustering based fuzzy cluster ensemble framework (RDCFCE), which is designed to improve the performance of RDCCE by integrating the newly proposed fuzzy extension model into the ensemble framework. RDCFCE adopts the normalized cut algorithm as the consensus function to summarize the fuzzy matrices generated by the fuzzy extension models, partition the consensus matrix, and obtain the final result. Finally, adaptive RDCFCE (A-RDCFCE) is proposed to optimize RDCFCE and improve the performance of RDCFCE further by adopting a self-evolutionary process (SEPP) for the parameter set. Experiments on real cancer gene expression profiles indicate that RDCFCE and A-RDCFCE works well on these data sets, and outperform most of the state-of-the-art tumor clustering algorithms.

  13. Analysis of the crystal lattice instability for cage–cluster systems using the superatom model

    Energy Technology Data Exchange (ETDEWEB)

    Serebrennikov, D. A., E-mail: dserebrennikov@innopark.kantiana.ru, E-mail: dimafania@mail.ru; Clementyev, E. S. [I. Kant Baltic Federal University, “Functional Nanomaterials” Scientific–Educational Center (Russian Federation); Alekseev, P. A. [“Kurchatov Institute” National Research Center (Russian Federation)

    2016-09-15

    We have investigated the lattice dynamics for a number of rare-earth hexaborides based on the superatom model within which the boron octahedron is substituted by one superatom with a mass equal to the mass of six boron atoms. Phenomenological models have been constructed for the acoustic and lowenergy optical phonon modes in RB{sub 6} (R = La, Gd, Tb, Dy) compounds. Using DyB{sub 6} as an example, we have studied the anomalous softening of longitudinal acoustic phonons in several crystallographic directions, an effect that is also typical of GdB{sub 6} and TbB{sub 6}. The softening of the acoustic branches is shown to be achieved through the introduction of negative interatomic force constants between rare-earth ions. We discuss the structural instability of hexaborides based on 4f elements, the role of valence instability in the lattice dynamics, and the influence of the number of f electrons on the degree of softening of phonon modes.

  14. Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data.

    Science.gov (United States)

    Ranciati, Saverio; Viroli, Cinzia; Wit, Ernst C

    2017-11-01

    Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next-generation sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP-Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a 'zero-inflation group' in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent conditional autoregressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP-seq real data. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Stability analysis in K-means clustering.

    Science.gov (United States)

    Steinley, Douglas

    2008-11-01

    This paper develops a new procedure, called stability analysis, for K-means clustering. Instead of ignoring local optima and only considering the best solution found, this procedure takes advantage of additional information from a K-means cluster analysis. The information from the locally optimal solutions is collected in an object by object co-occurrence matrix. The co-occurrence matrix is clustered and subsequently reordered by a steepest ascent quadratic assignment procedure to aid visual interpretation of the multidimensional cluster structure. Subsequently, measures are developed to determine the overall structure of a data set, the number of clusters and the multidimensional relationships between the clusters.

  16. Co-clustering models, algorithms and applications

    CERN Document Server

    Govaert, Gérard

    2013-01-01

    Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture

  17. Tanzania: A Hierarchical Cluster Analysis Approach | Ngaruko ...

    African Journals Online (AJOL)

    Using survey data from Kibondo district, west Tanzania, we use hierarchical cluster analysis to classify borrower farmers according to their borrowing behaviour into four distinctive clusters. The appreciation of the existence of heterogeneous farmer clusters is vital in forging credit delivery policies that are not only ...

  18. A Monte Carlo model of DNA double-strand break clustering and rejoining kinetics for the analysis of pulsed-field gel electrophoresis data.

    Science.gov (United States)

    Pinto, M; Prise, K M; Michael, B D

    2004-10-01

    In studies of radiation-induced DNA fragmentation and repair, analytical models may provide rapid and easy-to-use methods to test simple hypotheses regarding the breakage and rejoining mechanisms involved. The random breakage model, according to which lesions are distributed uniformly and independently of each other along the DNA, has been the model most used to describe spatial distribution of radiation-induced DNA damage. Recently several mechanistic approaches have been proposed that model clustered damage to DNA. In general, such approaches focus on the study of initial radiation-induced DNA damage and repair, without considering the effects of additional (unwanted and unavoidable) fragmentation that may take place during the experimental procedures. While most approaches, including measurement of total DNA mass below a specified value, allow for the occurrence of background experimental damage by means of simple subtractive procedures, a more detailed analysis of DNA fragmentation necessitates a more accurate treatment. We have developed a new, relatively simple model of DNA breakage and the resulting rejoining kinetics of broken fragments. Initial radiation-induced DNA damage is simulated using a clustered breakage approach, with three free parameters: the number of independently located clusters, each containing several DNA double-strand breaks (DSBs), the average number of DSBs within a cluster (multiplicity of the cluster), and the maximum allowed radius within which DSBs belonging to the same cluster are distributed. Random breakage is simulated as a special case of the DSB clustering procedure. When the model is applied to the analysis of DNA fragmentation as measured with pulsed-field gel electrophoresis (PFGE), the hypothesis that DSBs in proximity rejoin at a different rate from that of sparse isolated breaks can be tested, since the kinetics of rejoining of fragments of varying size may be followed by means of computer simulations. The problem of how

  19. Cluster analysis in phenotyping a Portuguese population.

    Science.gov (United States)

    Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J

    2015-09-03

    Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.

  20. Clustering Analysis within Text Classification Techniques

    Directory of Open Access Journals (Sweden)

    Madalina ZURINI

    2011-01-01

    Full Text Available The paper represents a personal approach upon the main applications of classification which are presented in the area of knowledge based society by means of methods and techniques widely spread in the literature. Text classification is underlined in chapter two where the main techniques used are described, along with an integrated taxonomy. The transition is made through the concept of spatial representation. Having the elementary elements of geometry and the artificial intelligence analysis, spatial representation models are presented. Using a parallel approach, spatial dimension is introduced in the process of classification. The main clustering methods are described in an aggregated taxonomy. For an example, spam and ham words are clustered and spatial represented, when the concepts of spam, ham and common and linkage word are presented and explained in the xOy space representation.

  1. Classification of breast mass lesions using model-based analysis of the characteristic kinetic curve derived from fuzzy c-means clustering.

    Science.gov (United States)

    Chang, Yeun-Chung; Huang, Yan-Hao; Huang, Chiun-Sheng; Chang, Pei-Kang; Chen, Jeon-Hor; Chang, Ruey-Feng

    2012-04-01

    The purpose of this study is to evaluate the diagnostic efficacy of the representative characteristic kinetic curve of dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI) extracted by fuzzy c-means (FCM) clustering for the discrimination of benign and malignant breast tumors using a novel computer-aided diagnosis (CAD) system. About the research data set, DCE-MRIs of 132 solid breast masses with definite histopathologic diagnosis (63 benign and 69 malignant) were used in this study. At first, the tumor region was automatically segmented using the region growing method based on the integrated color map formed by the combination of kinetic and area under curve color map. Then, the FCM clustering was used to identify the time-signal curve with the larger initial enhancement inside the segmented region as the representative kinetic curve, and then the parameters of the Tofts pharmacokinetic model for the representative kinetic curve were compared with conventional curve analysis (maximal enhancement, time to peak, uptake rate and washout rate) for each mass. The results were analyzed with a receiver operating characteristic curve and Student's t test to evaluate the classification performance. Accuracy, sensitivity, specificity, positive predictive value and negative predictive value of the combined model-based parameters of the extracted kinetic curve from FCM clustering were 86.36% (114/132), 85.51% (59/69), 87.30% (55/63), 88.06% (59/67) and 84.62% (55/65), better than those from a conventional curve analysis. The A(Z) value was 0.9154 for Tofts model-based parametric features, better than that for conventional curve analysis (0.8673), for discriminating malignant and benign lesions. In conclusion, model-based analysis of the characteristic kinetic curve of breast mass derived from FCM clustering provides effective lesion classification. This approach has potential in the development of a CAD system for DCE breast MRI. Copyright © 2012 Elsevier Inc

  2. Identifying Clusters with Mixture Models that Include Radial Velocity Observations

    Science.gov (United States)

    Czarnatowicz, Alexis; Ybarra, Jason E.

    2018-01-01

    The study of stellar clusters plays an integral role in the study of star formation. We present a cluster mixture model that considers radial velocity data in addition to spatial data. Maximum likelihood estimation through the Expectation-Maximization (EM) algorithm is used for parameter estimation. Our mixture model analysis can be used to distinguish adjacent or overlapping clusters, and estimate properties for each cluster.Work supported by awards from the Virginia Foundation for Independent Colleges (VFIC) Undergraduate Science Research Fellowship and The Research Experience @Bridgewater (TREB).

  3. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...

  4. Modeling Uncertainties in EEG Microstates: Analysis of Real and Imagined Motor Movements Using Probabilistic Clustering-Driven Training of Probabilistic Neural Networks

    Directory of Open Access Journals (Sweden)

    Martin Dinov

    2017-11-01

    Full Text Available Part of the process of EEG microstate estimation involves clustering EEG channel data at the global field power (GFP maxima, very commonly using a modified K-means approach. Clustering has also been done deterministically, despite there being uncertainties in multiple stages of the microstate analysis, including the GFP peak definition, the clustering itself and in the post-clustering assignment of microstates back onto the EEG timecourse of interest. We perform a fully probabilistic microstate clustering and labeling, to account for these sources of uncertainty using the closest probabilistic analog to KM called Fuzzy C-means (FCM. We train softmax multi-layer perceptrons (MLPs using the KM and FCM-inferred cluster assignments as target labels, to then allow for probabilistic labeling of the full EEG data instead of the usual correlation-based deterministic microstate label assignment typically used. We assess the merits of the probabilistic analysis vs. the deterministic approaches in EEG data recorded while participants perform real or imagined motor movements from a publicly available data set of 109 subjects. Though FCM group template maps that are almost topographically identical to KM were found, there is considerable uncertainty in the subsequent assignment of microstate labels. In general, imagined motor movements are less predictable on a time point-by-time point basis, possibly reflecting the more exploratory nature of the brain state during imagined, compared to during real motor movements. We find that some relationships may be more evident using FCM than using KM and propose that future microstate analysis should preferably be performed probabilistically rather than deterministically, especially in situations such as with brain computer interfaces, where both training and applying models of microstates need to account for uncertainty. Probabilistic neural network-driven microstate assignment has a number of advantages that we have

  5. 3D simulation of the Cluster-Cluster Aggregation model

    Science.gov (United States)

    Li, Chao; Xiong, Hailing

    2014-12-01

    We write a program to implement the Cluster-Cluster Aggregation (CCA) model with java programming language. By using the simulation program, the fractal aggregation growth process can be displayed dynamically in the form of a three-dimensional (3D) figure. Meanwhile, the related kinetics data of aggregation simulation can be also recorded dynamically. Compared to the traditional programs, the program has better real-time performance and is more helpful to observe the fractal growth process, which contributes to the scientific study in fractal aggregation. Besides, because of adopting java programming language, the program has very good cross-platform performance.

  6. FACTOR MODEL ASSESSMENT OF THE COMPETITIVE INNOVATION CLUSTERS ELECTRONICS BASED ON ANALYSIS OF THE STAGES OF THEIR LIFE CYCLE

    Directory of Open Access Journals (Sweden)

    A. V. Brykin

    2013-01-01

    Full Text Available The cluster principle development in the world of electronics is one of the most effective examples of high-tech industry. The author considers the possibility of using clusters to modernize the Russian economy.

  7. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    . A method to obtain an optimized number of clusters is outlined. Based upon the cluster's characteristics, a behavioural model is formulated in terms of a rule-base and an inference engine. The article reviews several variants for the model formulation. Some limitations of the methods are listed......Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate...

  8. Complex time series analysis of PM10 and PM2.5 for a coastal site using artificial neural network modelling and k-means clustering

    Science.gov (United States)

    Elangasinghe, M. A.; Singhal, N.; Dirks, K. N.; Salmond, J. A.; Samarasinghe, S.

    2014-09-01

    This paper uses artificial neural networks (ANN), combined with k-means clustering, to understand the complex time series of PM10 and PM2.5 concentrations at a coastal location of New Zealand based on data from a single site. Out of available meteorological parameters from the network (wind speed, wind direction, solar radiation, temperature, relative humidity), key factors governing the pattern of the time series concentrations were identified through input sensitivity analysis performed on the trained neural network model. The transport pathways of particulate matter under these key meteorological parameters were further analysed through bivariate concentration polar plots and k-means clustering techniques. The analysis shows that the external sources such as marine aerosols and local sources such as traffic and biomass burning contribute equally to the particulate matter concentrations at the study site. These results are in agreement with the results of receptor modelling by the Auckland Council based on Positive Matrix Factorization (PMF). Our findings also show that contrasting concentration-wind speed relationships exist between marine aerosols and local traffic sources resulting in very noisy and seemingly large random PM10 concentrations. The inclusion of cluster rankings as an input parameter to the ANN model showed a statistically significant (p < 0.005) improvement in the performance of the ANN time series model and also showed better performance in picking up high concentrations. For the presented case study, the correlation coefficient between observed and predicted concentrations improved from 0.77 to 0.79 for PM2.5 and from 0.63 to 0.69 for PM10 and reduced the root mean squared error (RMSE) from 5.00 to 4.74 for PM2.5 and from 6.77 to 6.34 for PM10. The techniques presented here enable the user to obtain an understanding of potential sources and their transport characteristics prior to the implementation of costly chemical analysis techniques or

  9. Analysis of Aspects of Innovation in a Brazilian Cluster

    Directory of Open Access Journals (Sweden)

    Adriana Valélia Saraceni

    2012-09-01

    Full Text Available Innovation through clustering has become very important on the increased significance that interaction represents on innovation and learning process concept. This study aims to identify whereas a case analysis on innovation process in a cluster represents on the learning process. Therefore, this study is developed in two stages. First, we used a preliminary case study verifying a cluster innovation analysis and it Innovation Index, for further, exploring a combined body of theory and practice. Further, the second stage is developed by exploring the learning process concept. Both stages allowed us building a theory model for the learning process development in clusters. The main results of the model development come up with a mechanism of improvement implementation on clusters when case studies are applied.

  10. [Cluster analysis and its application].

    Science.gov (United States)

    Půlpán, Zdenĕk

    2002-01-01

    The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.

  11. SUPPLY CHAIN ANALYSIS AND PERFORMANCE ASSESSMENT OF SME FISHERIES CLUSTERS

    Directory of Open Access Journals (Sweden)

    Anton Agus Setyawan

    2017-12-01

    Full Text Available Study of SME in Indonesia related with business networks and performance in these business organizations. In many cases, regional administration in Indonesia develops SME business network in the form of clusters. This study analyzes SME fisheries clusters with supply chain analysis.  We also develop performance assessment of SME fisheries cluster by using multivariate model. This study involves 62 SMEs in Sragen, Central Java Indonesia. Those SMEs  includes in fisheries cluster in the area. Our findings show that SME fisheries cluster has in-efficient supply chain. This business clusters has problems in profit setting and delivery time which harm their performance. We measure business performance by using business selling, profit rate and asset growth. We found that cost structure, man power and physical production has positive effects to business performance.

  12. From virtual clustering analysis to self-consistent clustering analysis: a mathematical study

    Science.gov (United States)

    Tang, Shaoqiang; Zhang, Lei; Liu, Wing Kam

    2018-03-01

    In this paper, we propose a new homogenization algorithm, virtual clustering analysis (VCA), as well as provide a mathematical framework for the recently proposed self-consistent clustering analysis (SCA) (Liu et al. in Comput Methods Appl Mech Eng 306:319-341, 2016). In the mathematical theory, we clarify the key assumptions and ideas of VCA and SCA, and derive the continuous and discrete Lippmann-Schwinger equations. Based on a key postulation of "once response similarly, always response similarly", clustering is performed in an offline stage by machine learning techniques (k-means and SOM), and facilitates substantial reduction of computational complexity in an online predictive stage. The clear mathematical setup allows for the first time a convergence study of clustering refinement in one space dimension. Convergence is proved rigorously, and found to be of second order from numerical investigations. Furthermore, we propose to suitably enlarge the domain in VCA, such that the boundary terms may be neglected in the Lippmann-Schwinger equation, by virtue of the Saint-Venant's principle. In contrast, they were not obtained in the original SCA paper, and we discover these terms may well be responsible for the numerical dependency on the choice of reference material property. Since VCA enhances the accuracy by overcoming the modeling error, and reduce the numerical cost by avoiding an outer loop iteration for attaining the material property consistency in SCA, its efficiency is expected even higher than the recently proposed SCA algorithm.

  13. Cluster analysis of word frequency dynamics

    Science.gov (United States)

    Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.

  14. Cluster analysis of word frequency dynamics

    International Nuclear Information System (INIS)

    Maslennikova, Yu S; Bochkarev, V V; Belashova, I A

    2015-01-01

    This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations

  15. Topics in modelling of clustered data

    CERN Document Server

    Aerts, Marc; Ryan, Louise M; Geys, Helena

    2002-01-01

    Many methods for analyzing clustered data exist, all with advantages and limitations in particular applications. Compiled from the contributions of leading specialists in the field, Topics in Modelling of Clustered Data describes the tools and techniques for modelling the clustered data often encountered in medical, biological, environmental, and social science studies. It focuses on providing a comprehensive treatment of marginal, conditional, and random effects models using, among others, likelihood, pseudo-likelihood, and generalized estimating equations methods. The authors motivate and illustrate all aspects of these models in a variety of real applications. They discuss several variations and extensions, including individual-level covariates and combined continuous and discrete outcomes. Flexible modelling with fractional and local polynomials, omnibus lack-of-fit tests, robustification against misspecification, exact, and bootstrap inferential procedures all receive extensive treatment. The application...

  16. ASteCA: Automated Stellar Cluster Analysis

    Science.gov (United States)

    Perren, G. I.; Vázquez, R. A.; Piatti, A. E.

    2015-04-01

    We present the Automated Stellar Cluster Analysis package (ASteCA), a suit of tools designed to fully automate the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its uncertainties. To validate the code we applied it on a large set of over 400 synthetic MASSCLEAN clusters with varying degrees of field star contamination as well as a smaller set of 20 observed Milky Way open clusters (Berkeley 7, Bochum 11, Czernik 26, Czernik 30, Haffner 11, Haffner 19, NGC 133, NGC 2236, NGC 2264, NGC 2324, NGC 2421, NGC 2627, NGC 6231, NGC 6383, NGC 6705, Ruprecht 1, Tombaugh 1, Trumpler 1, Trumpler 5 and Trumpler 14) studied in the literature. The results show that ASteCA is able to recover cluster parameters with an acceptable precision even for those clusters affected by substantial field star contamination. ASteCA is written in Python and is made available as an open source code which can be downloaded ready to be used from its official site.

  17. Cluster analysis of pharmacists' work attitudes.

    Science.gov (United States)

    Nakagomi, Keiichi; Hayashi, Yukikazu; Komiyama, Takako

    2017-12-01

    Few studies in Japan use clustering to examine the work attitudes of pharmacists. This study conducts an exploratory analysis to classify those attitudes based on previous studies to help staff pharmacists and their management to understand their mutually beneficial requirements. Survey data collected in previous studies from 1 228 community pharmacists and 419 hospital pharmacists were analyzed using Quantification Theory 3 and clustering. Among community pharmacists, two clusters, namely 30- to 34-year-old married males and married males aged over 35 years, reported the highest job satisfaction, intending to remain in their jobs for 5 years or more or until retirement. Conversely, one cluster of 35- to 39-year-old single females reported the lowest job satisfaction and intended to remain for less than 5  years or were undecided. Among hospital pharmacists, one cluster of 22- to 25-year-old single males reported the highest job satisfaction and intended to remain for more than 5 years. Conversely, one cluster of 30- to 34-year-old married males reported the lowest job satisfaction and a period of working undetermined. This study used clustering to explore how pharmacists of different ages, marital statuses, and experience felt regarding their work. Job satisfaction and human relationships are significant in considering future work plans of practicing pharmacists. Pharmacy staff, supervisors, and managers of community or hospital pharmacies must recognize features of pharmacists' work attitudes for offering high-quality service to patients.

  18. Fuzzy clustering analysis of microarray data.

    Science.gov (United States)

    Han, Lixin; Zeng, Xiaoqin; Yan, Hong

    2008-10-01

    Fuzzy clustering is a useful tool for identifying relevant subsets of microarray data. This paper proposes a fuzzy clustering method for microarray data analysis. An advantage of the method is that it used a combination of the fuzzy c-means and the principal component analysis to identify the groups of genes that show similar expression patterns. It allows a gene to belong to more than a gene expression pattern with different membership grades. The method is suitable for the analysis of large amounts of noisy microarray data.

  19. Principal Component Clustering Approach to Teaching Quality Discriminant Analysis

    Science.gov (United States)

    Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan

    2016-01-01

    Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…

  20. Are clusters of dietary patterns and cluster membership stable over time? Results of a longitudinal cluster analysis study.

    Science.gov (United States)

    Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein

    2014-11-01

    Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Spherical collapse models with clustered dark energy

    Science.gov (United States)

    Chang, Chia-Chun; Lee, Wolung; Ng, Kin-Wang

    2018-03-01

    We investigate the clustering effect of dark energy (DE) in the formation of galaxy clusters using the spherical collapse model. Assuming a fully clustered DE component, the spherical overdense region is treated as an isolated system which conserves the energy separately for both matter and DE inside the spherical region. Then, by introducing a parameter r to characterize the degree of DE clustering, which is defined by the nonlinear density contrast ratio of matter to DE at turnaround in the recollapsing process, i.e. r ≡δde,taNL /δm,taNL, we are able to uniquely determine the spherical collapsing process and hence obtain the virialized overdensity Δvir through a proper virialization scheme. Estimation of the virialized overdensities from current observation on galaxy clusters suggests that 0 . 5 clustered DE with w < - 0 . 9. Also, we compare our method to the linear perturbation theory that deals with the growth of DE perturbation at early times. While both results are consistent with each other, our method is practically simple and it shows that the collapse process is rather independent of initial DE perturbation and its evolution at early times.

  2. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...

  3. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  4. Dielectric spectroscopy platform to measure MCF10A epithelial cell aggregation as a model for spheroidal cell cluster analysis.

    Science.gov (United States)

    Heileman, K L; Tabrizian, M

    2017-05-02

    3-Dimensional cell cultures are more representative of the native environment than traditional cell cultures on flat substrates. As a result, 3-dimensional cell cultures have emerged as a very valuable model environment to study tumorigenesis, organogenesis and tissue regeneration. Many of these models encompass the formation of cell aggregates, which mimic the architecture of tumor and organ tissue. Dielectric impedance spectroscopy is a non-invasive, label free and real time technique, overcoming the drawbacks of established techniques to monitor cell aggregates. Here we introduce a platform to monitor cell aggregation in a 3-dimensional extracellular matrix using dielectric spectroscopy. The MCF10A breast epithelial cell line serves as a model for cell aggregation. The platform maintains sterile conditions during the multi-day assay while allowing continuous dielectric spectroscopy measurements. The platform geometry optimizes dielectric measurements by concentrating cells within the electrode sensing region. The cells show a characteristic dielectric response to aggregation which corroborates with finite element analysis computer simulations. By fitting the experimental dielectric spectra to the Cole-Cole equation, we demonstrated that the dispersion intensity Δε and the characteristic frequency f c are related to cell aggregate growth. In addition, microscopy can be performed directly on the platform providing information about cell position, density and morphology. This platform could yield many applications for studying the electrophysiological activity of cell aggregates.

  5. Topological modeling and classification of mammographic microcalcification clusters.

    Science.gov (United States)

    Chen, Zhili; Strange, Harry; Oliver, Arnau; Denton, Erika R E; Boggis, Caroline; Zwiggelaar, Reyer

    2015-04-01

    The presence of microcalcification clusters is a primary sign of breast cancer; however, it is difficult and time consuming for radiologists to classify microcalcifications as malignant or benign. In this paper, a novel method for the classification of microcalcification clusters in mammograms is proposed. The topology/connectivity of individual microcalcifications is analyzed within a cluster using multiscale morphology. This is distinct from existing approaches that tend to concentrate on the morphology of individual microcalcifications and/or global (statistical) cluster features. A set of microcalcification graphs are generated to represent the topological structure of microcalcification clusters at different scales. Subsequently, graph theoretical features are extracted, which constitute the topological feature space for modeling and classifying microcalcification clusters. k-nearest-neighbors-based classifiers are employed for classifying microcalcification clusters. The validity of the proposed method is evaluated using two well-known digitized datasets (MIAS and DDSM) and a full-field digital dataset. High classification accuracies (up to 96%) and good ROC results (area under the ROC curve up to 0.96) are achieved. A full comparison with related publications is provided, which includes a direct comparison. The results indicate that the proposed approach is able to outperform the current state-of-the-art methods. Significance: This study shows that topology modeling is an important tool for microcalcification analysis not only because of the improved classification accuracy but also because the topological measures can be linked to clinical understanding.

  6. Gravitational lens models of arcs in clusters

    International Nuclear Information System (INIS)

    Bergmann, A.G.; Petrosian, V.; Lynds, R.

    1990-01-01

    It is now well established that the luminous arcs discovered in clusters of galaxies, in particular those in Abell 370 and Cluster 2244-02, are produced by gravitational lensing of background sources. The arcs are modeled and constraints are placed on the distribution of the mass in the clusters and the shape and size of the sources. The models require, as expected, a large amount of dark matter in the clusters and a mass-to blue-light ratio for the cluster which exceeds 100 solar mass/solar luminosity and could be as high as 1000 solar mass/solar luminosity depending on cosmological parameters and the distribution of the dark matter. Furthermore, it is found that in the case of the arc in A370 the dark matter must have a different distribution than the luminous galaxies, while for the arc in Cl 2244 the dark matter can have a distribution similar to that of the light matter (galaxies) or a separate distribution. 30 refs

  7. Cluster Analysis of Properties of Temperament

    Directory of Open Access Journals (Sweden)

    A I Krupnov

    2014-12-01

    Full Text Available The paper presents the cluster analysis of various properties of temperament, based on the systematic structure of its main components. On the basis of the received data the qualitative psychological characteristic of the four types of temperament is given.

  8. On the clustering of climate models in ensemble seasonal forecasting

    Science.gov (United States)

    Yuan, Xing; Wood, Eric F.

    2012-09-01

    Multi-model ensemble seasonal forecasting system has expanded in recent years, with a dozen coupled climate models around the world being used to produce hindcasts or real-time forecasts. However, many models are sharing similar atmospheric or oceanic components which may result in similar forecasts. This raises questions of whether the ensemble is over-confident if we treat each model equally, or whether we can obtain an effective subset of models that can retain predictability and skill as well. In this study, we use a hierarchical clustering method based on inverse trigonometric cosine function of the anomaly correlation of pairwise model hindcasts to measure the similarities among twelve American and European seasonal forecast models. Though similarities are found between models sharing the same atmospheric component, different versions of models from the same center sometimes produce quite different temperature forecasts, which indicate that detailed physics packages such as radiation and land surface schemes need to be analyzed in interpreting the clustering result. Uncertainties in clustering for different forecast lead times also make reducing redundant models more complicated. Predictability analysis shows that multi-model ensemble is not necessarily better than a single model, while the cluster ensemble shows consistent improvement against individual models. The eight model-based cluster ensemble forecast shows comparable performance to the total twelve model ensemble in terms of probabilistic forecast skill for accuracy and discrimination. This study also manifests that models developed in U.S. and Europe are more independent from each other, suggesting the necessity of international collaboration in enhancing multi-model ensemble seasonal forecasting.

  9. Modelling Baryonic Effects on Galaxy Cluster Mass Profiles

    Science.gov (United States)

    Shirasaki, Masato; Lau, Erwin T.; Nagai, Daisuke

    2018-03-01

    Gravitational lensing is a powerful probe of the mass distribution of galaxy clusters and cosmology. However, accurate measurements of the cluster mass profiles are limited by uncertainties in cluster astrophysics. In this work, we present a physically motivated model of baryonic effects on the cluster mass profiles, which self-consistently takes into account the impact of baryons on the concentration as well as mass accretion histories of galaxy clusters. We calibrate this model using the Omega500 hydrodynamical cosmological simulations of galaxy clusters with varying baryonic physics. Our model will enable us to simultaneously constrain cluster mass, concentration, and cosmological parameters using stacked weak lensing measurements from upcoming optical cluster surveys.

  10. Cluster analysis for determining distribution center location

    Science.gov (United States)

    Lestari Widaningrum, Dyah; Andika, Aditya; Murphiyanto, Richard Dimas Julian

    2017-12-01

    Determination of distribution facilities is highly important to survive in the high level of competition in today’s business world. Companies can operate multiple distribution centers to mitigate supply chain risk. Thus, new problems arise, namely how many and where the facilities should be provided. This study examines a fast-food restaurant brand, which located in the Greater Jakarta. This brand is included in the category of top 5 fast food restaurant chain based on retail sales. There were three stages in this study, compiling spatial data, cluster analysis, and network analysis. Cluster analysis results are used to consider the location of the additional distribution center. Network analysis results show a more efficient process referring to a shorter distance to the distribution process.

  11. Fuzzy clustering analysis of osteosarcoma related genes.

    Science.gov (United States)

    Chen, Kai; Wu, Dajiang; Bai, Yushu; Zhu, Xiaodong; Chen, Ziqiang; Wang, Chuanfeng; Zhao, Yingchuan; Li, Ming

    2014-07-01

    Osteosarcoma is the most common malignant bone-tumor with a peak manifestation during the second and third decade of life. In order to explore the influence of genetic factors on the mechanism of osteosarcoma by analyzing the inter relationship between osteosarcoma and its related genes, and then provide potential genetic references for the prevention, diagnosis and treatment of osteosarcoma, we collected osteosarcoma related gene sequences in Genebank of National Center for Biotechnology Information (NCBI) and local alignment analysis for a pair of sequences was carried out to identify the measurement association among related sequences. Then fuzzy clustering method was used for clustering analysis so as to contact the unknown genes through the consistent osteosarcoma related genes in one class. From the result of fuzzy clustering analysis, we could classify the osteosarcoma related genes into two groups and deduced that the genes clustered into one group had similar function. Based on this knowledge, we found more genes related to the pathogenesis of osteosarcoma and these genes could exert similar function as Runx2, a risk factor confirmed in osteosarcoma, this study may help better understand the genetic mechanism and provide new molecular markers and therapies for osteosarcoma.

  12. Joint model-based clustering of nonlinear longitudinal trajectories and associated time-to-event data analysis, linked by latent class membership: with application to AIDS clinical studies.

    Science.gov (United States)

    Huang, Yangxin; Lu, Xiaosun; Chen, Jiaqing; Liang, Juan; Zangmeister, Miriam

    2017-10-27

    Longitudinal and time-to-event data are often observed together. Finite mixture models are currently used to analyze nonlinear heterogeneous longitudinal data, which, by releasing the homogeneity restriction of nonlinear mixed-effects (NLME) models, can cluster individuals into one of the pre-specified classes with class membership probabilities. This clustering may have clinical significance, and be associated with clinically important time-to-event data. This article develops a joint modeling approach to a finite mixture of NLME models for longitudinal data and proportional hazard Cox model for time-to-event data, linked by individual latent class indicators, under a Bayesian framework. The proposed joint models and method are applied to a real AIDS clinical trial data set, followed by simulation studies to assess the performance of the proposed joint model and a naive two-step model, in which finite mixture model and Cox model are fitted separately.

  13. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    Science.gov (United States)

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Results Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Conclusions Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations

  14. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations.

    Science.gov (United States)

    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L

    2014-05-24

    There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where

  15. Semi-supervised consensus clustering for gene expression data analysis

    OpenAIRE

    Wang, Yunli; Pan, Youlian

    2014-01-01

    Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...

  16. The Analysis and Clustering of Navy Ratings Based on Social Interaction Characteristics: A Literature Review and Conceptual Model

    Science.gov (United States)

    1988-06-01

    classify and differentiate styles. Wiggins (1979) proposed a circumplex model of social interaction variables composed of four sets of bipolar traits...Social Psychology, 37, 395-412. Wiggins, J. S. (1980). Circumplex models of interpersonal behavior. In Wheeler, L. (Ed.), Review of personality and...social psychology (Vol. 1). Beverly Hills: Sage Publications. Wiggins, 2. S. (1982). Circumplex models of interpersonal behavior in clinical psychology

  17. Using ICD for structural analysis of clusters: a case study on NeAr clusters

    Science.gov (United States)

    Fasshauer, E.; Förstel, M.; Pallmann, S.; Pernpointner, M.; Hergenhahn, U.

    2014-10-01

    We present a method to utilize interatomic Coulombic decay (ICD) to retrieve information about the mean geometric structures of heteronuclear clusters. It is based on observation and modelling of competing ICD channels, which involve the same initial vacancy, but energetically different final states with vacancies in different components of the cluster. Using binary rare gas clusters of Ne and Ar as an example, we measure the relative intensity of ICD into (Ne+)2 and Ne+Ar+ final states with spectroscopically well separated ICD peaks. We compare in detail the experimental ratios of the Ne-Ne and Ne-Ar ICD contributions and their positions and widths to values calculated for a diverse set of possible structures. We conclude that NeAr clusters exhibit a core-shell structure with an argon core surrounded by complete neon shells and, possibly, further an incomplete shell of neon atoms for the experimental conditions investigated. Our analysis allows one to differentiate between clusters of similar size and stochiometric Ar content, but different internal structure. We find evidence for ICD of Ne 2s-1, producing Ar+ vacancies in the second coordination shell of the initial site.

  18. Changing cluster composition in cluster randomised controlled trials: design and analysis considerations

    OpenAIRE

    Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L

    2014-01-01

    Background There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. Methods We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed ...

  19. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  20. Advanced analysis of forest fire clustering

    Science.gov (United States)

    Kanevski, Mikhail; Pereira, Mario; Golay, Jean

    2017-04-01

    Analysis of point pattern clustering is an important topic in spatial statistics and for many applications: biodiversity, epidemiology, natural hazards, geomarketing, etc. There are several fundamental approaches used to quantify spatial data clustering using topological, statistical and fractal measures. In the present research, the recently introduced multi-point Morisita index (mMI) is applied to study the spatial clustering of forest fires in Portugal. The data set consists of more than 30000 fire events covering the time period from 1975 to 2013. The distribution of forest fires is very complex and highly variable in space. mMI is a multi-point extension of the classical two-point Morisita index. In essence, mMI is estimated by covering the region under study by a grid and by computing how many times more likely it is that m points selected at random will be from the same grid cell than it would be in the case of a complete random Poisson process. By changing the number of grid cells (size of the grid cells), mMI characterizes the scaling properties of spatial clustering. From mMI, the data intrinsic dimension (fractal dimension) of the point distribution can be estimated as well. In this study, the mMI of forest fires is compared with the mMI of random patterns (RPs) generated within the validity domain defined as the forest area of Portugal. It turns out that the forest fires are highly clustered inside the validity domain in comparison with the RPs. Moreover, they demonstrate different scaling properties at different spatial scales. The results obtained from the mMI analysis are also compared with those of fractal measures of clustering - box counting and sand box counting approaches. REFERENCES Golay J., Kanevski M., Vega Orozco C., Leuenberger M., 2014: The multipoint Morisita index for the analysis of spatial patterns. Physica A, 406, 191-202. Golay J., Kanevski M. 2015: A new estimator of intrinsic dimension based on the multipoint Morisita index

  1. Cluster Analysis in Rapeseed (Brassica Napus L.)

    International Nuclear Information System (INIS)

    Mahasi, J.M

    2002-01-01

    With widening edible deficit, Kenya has become increasingly dependent on imported edible oils. Many oilseed crops (e.g. sunflower, soya beans, rapeseed/mustard, sesame, groundnuts etc) can be grown in Kenya. But oilseed rape is preferred because it very high yielding (1.5 tons-4.0 tons/ha) with oil content of 42-46%. Other uses include fitting in various cropping systems as; relay/inter crops, rotational crops, trap crops and fodder. It is soft seeded hence oil extraction is relatively easy. The meal is high in protein and very useful in livestock supplementation. Rapeseed can be straight combined using adjusted wheat combines. The priority is to expand domestic oilseed production, hence the need to introduce improved rapeseed germplasm from other countries. The success of any crop improvement programme depends on the extent of genetic diversity in the material. Hence, it is essential to understand the adaptation of introduced genotypes and the similarities if any among them. Evaluation trials were carried out on 17 rapeseed genotypes (nine Canadian origin and eight of European origin) grown at 4 locations namely Endebess, Njoro, Timau and Mau Narok in three years (1992, 1993 and 1994). Results for 1993 were discarded due to severe drought. An analysis of variance was carried out only on seed yields and the treatments were found to be significantly different. Cluster analysis was then carried out on mean seed yields and based on this analysis; only one major group exists within the material. In 1992, varieties 2,3,8 and 9 didn't fall in the same cluster as the rest. Variety 8 was the only one not classified with the rest of the Canadian varieties. Three European varieties (2,3 and 9) were however not classified with the others. In 1994, varieties 10 and 6 didn't fall in the major cluster. Of these two, variety 10 is of Canadian origin. Varieties were more similar in 1994 than 1992 due to favorable weather. It is evident that, genotypes from different geographical

  2. Network clustering analysis using mixture exponential-family random graph models and its application in genetic interaction data.

    Science.gov (United States)

    Wang, Yishu; Zhao, Hongyu; Deng, Minghua; Fang, Huaying; Yang, Dejie

    2017-08-24

    Epistatic miniarrary profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. It provides an incredible set of molecular tools and advanced technologies that should be efficiently understanding the relationship between the genotypes and phenotypes of individuals. However, the network information gained from EMAP cannot be fully exploited using the traditional statistical network models. Because the genetic network is always heterogeneous, for example, the network structure features for one subset of nodes are different from those of the left nodes. Exponentialfamily random graph models (ERGMs) are a family of statistical models, which provide a principled and flexible way to describe the structural features (e.g. the density, centrality and assortativity) of an observed network. However, the single ERGM is not enough to capture this heterogeneity of networks. In this paper, we consider a mixture ERGM (MixtureEGRM) networks, which model a network with several communities, where each community is described by a single EGRM.

  3. Outcome-Driven Cluster Analysis with Application to Microarray Data.

    Directory of Open Access Journals (Sweden)

    Jessie J Hsu

    Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.

  4. Chaotic map clustering algorithm for EEG analysis

    Science.gov (United States)

    Bellotti, R.; De Carlo, F.; Stramaglia, S.

    2004-03-01

    The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.

  5. Tweets clustering using latent semantic analysis

    Science.gov (United States)

    Rasidi, Norsuhaili Mahamed; Bakar, Sakhinah Abu; Razak, Fatimah Abdul

    2017-04-01

    Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as `tweet". In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users' responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

  6. The Quantitative Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, Ethirajan

    2016-07-01

    Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity

  7. New symmetry of the cluster model

    Science.gov (United States)

    Gai, Moshe

    2015-10-01

    A new approach to clustering in the frame of the Algebraic Cluster Model (ACM) has been developed. It predicts rotation-vibration structure with rotational band of an oblate equilateral triangular spinning top with a 𝒟3h symmetry characterized by the sequence of states: 0+, 2+, 3-, 4±, 5- with almost degenerate 4+ and 4- (parity doublet) states. Our measurement of the new 22+ and the measured of the new 5- state in 12C fit very well to the predicted (ground state) rotational band structure with the sequence of states: 0+, 2+, 3-, 4±, 5- with almost degenerate 4+ and 4- (parity doublet) states. Such a 𝒟3h symmetry was observed in triatomic molecules, and it is observed in 12C for the first time in nuclear physics. We discuss a classification of other rotation-vibration bands in 12C such as the (0+) Hoyle band and the (1-) bending mode band and suggest measurements in search of the predicted ("missing") states that may shed new light on clustering in 12C and light nuclei. In particular, the observation (or non observation) of the predicted ("missing") states in the Hoyle band will allow us to conclude the geometrical arrangement of the three alpha particles composing the Hoyle state at 7.654 MeV in 12C.

  8. Properties of gold clusters and molecule-coated gold clusters as studied by molecular modeling

    OpenAIRE

    Walderhaug, Martin E

    2016-01-01

    The properties of small gold clusters are studied by use of density functional theory (DFT). A method validation study is conducted to choose a suitable DFT method. Geometry optimizations are performed on a number of different clusters, and their cohesive energies are computed. The charge distribution in the Au20 cluster is studied, both in the presence and absence of an electric field. The results are interpreted in terms of a model for the atomic charges in the cluster derived from electron...

  9. Three-Dimensional Modeling of Fracture Clusters in Geothermal Reservoirs

    Energy Technology Data Exchange (ETDEWEB)

    Ghassemi, Ahmad [Univ. of Oklahoma, Norman, OK (United States)

    2017-08-11

    The objective of this is to develop a 3-D numerical model for simulating mode I, II, and III (tensile, shear, and out-of-plane) propagation of multiple fractures and fracture clusters to accurately predict geothermal reservoir stimulation using the virtual multi-dimensional internal bond (VMIB). Effective development of enhanced geothermal systems can significantly benefit from improved modeling of hydraulic fracturing. In geothermal reservoirs, where the temperature can reach or exceed 350oC, thermal and poro-mechanical processes play an important role in fracture initiation and propagation. In this project hydraulic fracturing of hot subsurface rock mass will be numerically modeled by extending the virtual multiple internal bond theory and implementing it in a finite element code, WARP3D, a three-dimensional finite element code for solid mechanics. The new constitutive model along with the poro-thermoelastic computational algorithms will allow modeling the initiation and propagation of clusters of fractures, and extension of pre-existing fractures. The work will enable the industry to realistically model stimulation of geothermal reservoirs. The project addresses the Geothermal Technologies Office objective of accurately predicting geothermal reservoir stimulation (GTO technology priority item). The project goal will be attained by: (i) development of the VMIB method for application to 3D analysis of fracture clusters; (ii) development of poro- and thermoelastic material sub-routines for use in 3D finite element code WARP3D; (iii) implementation of VMIB and the new material routines in WARP3D to enable simulation of clusters of fractures while accounting for the effects of the pore pressure, thermal stress and inelastic deformation; (iv) simulation of 3D fracture propagation and coalescence and formation of clusters, and comparison with laboratory compression tests; and (v) application of the model to interpretation of injection experiments (planned by our

  10. Examining lower urinary tract symptom constellations using cluster analysis.

    Science.gov (United States)

    Coyne, Karin S; Matza, Louis S; Kopp, Zoe S; Thompson, Christine; Henry, David; Irwin, Debra E; Artibani, Walter; Herschorn, Sender; Milsom, Ian

    2008-05-01

    To gain a better understanding of how patients experience lower urinary tract symptoms (LUTS) and to determine whether particular symptoms cluster together, as LUTS seldom occur alone. A secondary analysis of a cross-sectional, population-based survey of adults in Sweden, Italy, Germany, UK and Canada was undertaken to examine the presence of LUTS groups. Of the 19,165 telephone surveys, 13,519 respondents reported at least one LUTS and were included in the analysis. All respondents were asked about the presence of 14 LUTS (International Prostate Symptom Score plus seven additional LUTS). K-means cluster analyses, a statistical method for sorting objects into groups so that similar objects are grouped together, was used to identify groups of people based on their symptoms. Men and women were analysed separately. A split-half random sample was selected from the dataset so that exploratory analyses could be conducted in one half and confirmed in the second. On model confirmation, the sample was analysed in its entirety. Included in this analysis were 5014 men (mean age 49.8 years; 95% white) and 8505 women (mean age 50.4 years; 96% white). Among both men and women, six distinct symptom cluster groups were identified and the symptom patterns of each cluster were examined. For both, the largest cluster consisted of respondents with minimal symptoms (i.e. reporting essentially one symptom), 56% of men and 57% of women. The remaining five clusters for men and women were labelled based on their predominant symptoms. For men, the clusters were nocturia of twice or more per night (12%); terminal dribble (11%); urgency (10%); multiple symptoms (9%); and postvoid incontinence (5%). For women, the clusters were nocturia of twice or more per night (12%); terminal dribble (10%); urgency (8%); stress incontinence (8%); and multiple symptoms (5%). The multiple-symptom groups had several and varied LUTS, were older, and had more comorbidities. Clusters of terminal dribble and male

  11. A dynamical condition for a relativistic galaxy cluster model

    International Nuclear Information System (INIS)

    Trevese, D.; Vignato, A.

    1976-01-01

    In an attempt to give a coherent interpretation of the secondary maximum in the density distribution of clusters an approximate metric tensor proposed by other authors is used with the purpose of building a relativistic generalization of the isothermal models of galaxy clusters. Although such a generalization gives rise to oscillations in the density distribution, the quantitative agreement with the observational data is unsatisfactory. The analysis of the metric tensor used brings out the points (i) the approximation on which the metric is based is not suitable for describing an actual galaxy and (ii) the dynamical conditions of clusters require inclusion of a cosmological expansion, and of anisotropic distribution function in the phase-space. (Auth.)

  12. EM cluster analysis for categorical data

    Czech Academy of Sciences Publication Activity Database

    Grim, Jiří

    2006-01-01

    Roč. 44, č. 4109 (2006), s. 640-648 ISSN 0302-9743. [Joint IAPR International Workshops SSPR 2006 and SPR 2006. Hong Kong , 17.08.2006-19.08.2006] R&D Projects: GA AV ČR 1ET400750407; GA MŠk 1M0572 EU Projects: European Commission(XE) 507752 - MUSCLE Institutional research plan: CEZ:AV0Z10750506 Keywords : cluster analysis * categorical data * EM algorithm Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 0.402, year: 2005

  13. Cluster analysis of BI-RADS descriptions of biopsy-proven breast lesions

    Science.gov (United States)

    Markey, Mia K.; Lo, Joseph Y.; Tourassi, Georgia D.; Floyd, Carey E., Jr.

    2002-05-01

    The purpose of this study was to identify and characterize clusters in a heterogeneous breast cancer computer-aided diagnosis database. Identification of subgroups within the database could help elucidate clinical trends and facilitate future model building. Agglomerative hierarchical clustering and k-means clustering were used to identify clusters in a large, heterogeneous computer-aided diagnosis database based on mammographic findings (BI-RADS) and patient age. The clusters were examined in terms of their feature distributions. The clusters showed logical separation of distinct clinical subtypes such as architectural distortions, masses, and calcifications. Moreover, the common subtypes of masses and calcifications were stratified into clusters based on age groupings. The percent of the cases that were malignant was notably different among the clusters. Cluster analysis can provide a powerful tool in discerning the subgroups present in a large, heterogeneous computer-aided diagnosis database.

  14. Feature recognition and clustering for urban modelling

    NARCIS (Netherlands)

    Chaszar, A.; Beirao, J.N.

    2013-01-01

    In urban planning exploration and analysis assist the generation, measurement, interpretation and management of the modelled urban environments. This frequently involves categorisation of model elements and identification of element types. Such designation of elements can be achieved through

  15. Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.

    Directory of Open Access Journals (Sweden)

    Jeban Ganesalingam

    2009-09-01

    Full Text Available Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes.Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method.The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001.The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.

  16. Latent Clustering Models for Outlier Identification in Telecom Data

    Directory of Open Access Journals (Sweden)

    Ye Ouyang

    2016-01-01

    Full Text Available Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA and time-dependent Gaussian Mixture Models (time-GMM. These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information. We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.

  17. K-means cluster analysis and seismicity partitioning for Pakistan

    Science.gov (United States)

    Rehman, Khaista; Burton, Paul W.; Weatherill, Graeme A.

    2014-07-01

    an area from 23.00° to 39.00°N and 59.00° to 80.00°E. A threshold magnitude of 5.2 is considered for K-means cluster analysis. The current study uses the traditional metrics of cluster quality, in addition to a seismic hazard contextual metric to attempt to constrain the preferred number of clusters found in the data. The spatial distribution of earthquakes from the catalogue was used to define the seismic clusters for Pakistan, which can be used further in the process of defining seismogenic sources and corresponding earthquake recurrence models for estimates of seismic hazard and risk in Pakistan. Consideration of the different approaches to cluster validation in a seismic hazard context suggests that Pakistan may be divided into K = 19 seismic clusters, including some portions of the neighbouring countries of Afghanistan, Tajikistan and India.

  18. Cluster Analysis in Nursing Research: An Introduction, Historical Perspective, and Future Directions.

    Science.gov (United States)

    Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G

    2017-05-01

    The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.

  19. Cluster analysis of rural, urban, and curbside atmospheric particle size data.

    Science.gov (United States)

    Beddows, David C S; Dall'Osto, Manuel; Harrison, Roy M

    2009-07-01

    Particle size is a key determinant of the hazard posed by airborne particles. Continuous multivariate particle size data have been collected using aerosol particle size spectrometers sited at four locations within the UK: Harwell (Oxfordshire); Regents Park (London); British Telecom Tower (London); and Marylebone Road (London). These data have been analyzed using k-means cluster analysis, deduced to be the preferred cluster analysis technique, selected from an option of four partitional cluster packages, namelythe following: Fuzzy; k-means; k-median; and Model-Based clustering. Using cluster validation indices k-means clustering was shown to produce clusters with the smallest size, furthest separation, and importantly the highest degree of similarity between the elements within each partition. Using k-means clustering, the complexity of the data set is reduced allowing characterization of the data according to the temporal and spatial trends of the clusters. At Harwell, the rural background measurement site, the cluster analysis showed that the spectra may be differentiated by their modal-diameters and average temporal trends showing either high counts during the day-time or night-time hours. Likewise for the urban sites, the cluster analysis differentiated the spectra into a small number of size distributions according their modal-diameter, the location of the measurement site, and time of day. The responsible aerosol emission, formation, and dynamic processes can be inferred according to the cluster characteristics and correlation to concurrently measured meteorological, gas phase, and particle phase measurements.

  20. An image segmentation method based on network clustering model

    Science.gov (United States)

    Jiao, Yang; Wu, Jianshe; Jiao, Licheng

    2018-01-01

    Network clustering phenomena are ubiquitous in nature and human society. In this paper, a method involving a network clustering model is proposed for mass segmentation in mammograms. First, the watershed transform is used to divide an image into regions, and features of the image are computed. Then a graph is constructed from the obtained regions and features. The network clustering model is applied to realize clustering of nodes in the graph. Compared with two classic methods, the algorithm based on the network clustering model performs more effectively in experiments.

  1. The Productivity Analysis of Chennai Automotive Industry Cluster

    Science.gov (United States)

    Bhaskaran, E.

    2014-07-01

    Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.

  2. Variable Selection in Model-based Clustering: A General Variable Role Modeling

    OpenAIRE

    Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure

    2008-01-01

    The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...

  3. Influence of cluster mobility on Cu precipitation in α-Fe: A cluster dynamics modeling

    International Nuclear Information System (INIS)

    Jourdan, T.; Soisson, F.; Clouet, E.; Barbu, A.

    2010-01-01

    A cluster dynamics model has been parametrized to quantitatively reproduce results obtained by atomistic kinetic Monte Carlo (AKMC) modeling on the precipitation of Cu in α-Fe under thermal aging. The cluster mobility, highlighted by AKMC, is shown to have a significant effect on the precipitation kinetics and can reconcile the experimentally observed fast kinetics with the relatively low diffusivity of Cu monomers.

  4. Application of cluster analysis for data driven market segmentation ...

    African Journals Online (AJOL)

    This research work is all out to capture: which standard of application of cluster analysis have emerged in the academic marketing literature, compare their standards of applying the methodological knowledge about clustering procedures and delineate sudden changes in clustering habits. These goals are achieved by ...

  5. Experimental Tests of the Algebraic Cluster Model

    Science.gov (United States)

    Gai, Moshe

    2018-02-01

    The Algebraic Cluster Model (ACM) of Bijker and Iachello that was proposed already in 2000 has been recently applied to 12C and 16O with much success. We review the current status in 12C with the outstanding observation of the ground state rotational band composed of the spin-parity states of: 0+, 2+, 3-, 4± and 5-. The observation of the 4± parity doublet is a characteristic of (tri-atomic) molecular configuration where the three alpha- particles are arranged in an equilateral triangular configuration of a symmetric spinning top. We discuss future measurement with electron scattering, 12C(e,e’) to test the predicted B(Eλ) of the ACM.

  6. Comparing clustering models in bank customers: Based on Fuzzy relational clustering approach

    Directory of Open Access Journals (Sweden)

    Ayad Hendalianpour

    2016-11-01

    Full Text Available Clustering is absolutely useful information to explore data structures and has been employed in many places. It organizes a set of objects into similar groups called clusters, and the objects within one cluster are both highly similar and dissimilar with the objects in other clusters. The K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms are the most popular clustering algorithms for their easy implementation and fast work, but in some cases we cannot use these algorithms. Regarding this, in this paper, a hybrid model for customer clustering is presented that is applicable in five banks of Fars Province, Shiraz, Iran. In this way, the fuzzy relation among customers is defined by using their features described in linguistic and quantitative variables. As follows, the customers of banks are grouped according to K-mean, C-mean, Fuzzy C-mean and Kernel K-mean algorithms and the proposed Fuzzy Relation Clustering (FRC algorithm. The aim of this paper is to show how to choose the best clustering algorithms based on density-based clustering and present a new clustering algorithm for both crisp and fuzzy variables. Finally, we apply the proposed approach to five datasets of customer's segmentation in banks. The result of the FCR shows the accuracy and high performance of FRC compared other clustering methods.

  7. Fuzzy cluster analysis of high-field functional MRI data.

    Science.gov (United States)

    Windischberger, Christian; Barth, Markus; Lamm, Claus; Schroeder, Lee; Bauer, Herbert; Gur, Ruben C; Moser, Ewald

    2003-11-01

    Functional magnetic resonance imaging (fMRI) based on blood-oxygen level dependent (BOLD) contrast today is an established brain research method and quickly gains acceptance for complementary clinical diagnosis. However, neither the basic mechanisms like coupling between neuronal activation and haemodynamic response are known exactly, nor can the various artifacts be predicted or controlled. Thus, modeling functional signal changes is non-trivial and exploratory data analysis (EDA) may be rather useful. In particular, identification and separation of artifacts as well as quantification of expected, i.e. stimulus correlated, and novel information on brain activity is important for both, new insights in neuroscience and future developments in functional MRI of the human brain. After an introduction on fuzzy clustering and very high-field fMRI we present several examples where fuzzy cluster analysis (FCA) of fMRI time series helps to identify and locally separate various artifacts. We also present and discuss applications and limitations of fuzzy cluster analysis in very high-field functional MRI: differentiate temporal patterns in MRI using (a) a test object with static and dynamic parts, (b) artifacts due to gross head motion artifacts. Using a synthetic fMRI data set we quantitatively examine the influences of relevant FCA parameters on clustering results in terms of receiver-operator characteristics (ROC) and compare them with a commonly used model-based correlation analysis (CA) approach. The application of FCA in analyzing in vivo fMRI data is shown for (a) a motor paradigm, (b) data from multi-echo imaging, and (c) a fMRI study using mental rotation of three-dimensional cubes. We found that differentiation of true "neural" from false "vascular" activation is possible based on echo time dependence and specific activation levels, as well as based on their signal time-course. Exploratory data analysis methods in general and fuzzy cluster analysis in particular may

  8. Model-based clustering with certainty estimation: implication for clade assignment of influenza viruses.

    Science.gov (United States)

    Zhang, Shunpu; Li, Zhong; Beland, Kevin; Lu, Guoqing

    2016-07-21

    Clustering is a common technique used by molecular biologists to group homologous sequences and study evolution. There remain issues such as how to cluster molecular sequences accurately and in particular how to evaluate the certainty of clustering results. We presented a model-based clustering method to analyze molecular sequences, described a subset bootstrap scheme to evaluate a certainty of the clusters, and showed an intuitive way using 3D visualization to examine clusters. We applied the above approach to analyze influenza viral hemagglutinin (HA) sequences. Nine clusters were estimated for high pathogenic H5N1 avian influenza, which agree with previous findings. The certainty for a given sequence that can be correctly assigned to a cluster was all 1.0 whereas the certainty for a given cluster was also very high (0.92-1.0), with an overall clustering certainty of 0.95. For influenza A H7 viruses, ten HA clusters were estimated and the vast majority of sequences could be assigned to a cluster with a certainty of more than 0.99. The certainties for clusters, however, varied from 0.40 to 0.98; such certainty variation is likely attributed to the heterogeneity of sequence data in different clusters. In both cases, the certainty values estimated using the subset bootstrap method are all higher than those calculated based upon the standard bootstrap method, suggesting our bootstrap scheme is applicable for the estimation of clustering certainty. We formulated a clustering analysis approach with the estimation of certainties and 3D visualization of sequence data. We analysed 2 sets of influenza A HA sequences and the results indicate our approach was applicable for clustering analysis of influenza viral sequences.

  9. Cluster Analysis: Unsupervised Learning via Supervised Learning with a Non-convex Penalty.

    Science.gov (United States)

    Pan, Wei; Shen, Xiaotong; Liu, Binghui

    2013-07-01

    Clustering analysis is widely used in many fields. Traditionally clustering is regarded as unsupervised learning for its lack of a class label or a quantitative response variable, which in contrast is present in supervised learning such as classification and regression. Here we formulate clustering as penalized regression with grouping pursuit. In addition to the novel use of a non-convex group penalty and its associated unique operating characteristics in the proposed clustering method, a main advantage of this formulation is its allowing borrowing some well established results in classification and regression, such as model selection criteria to select the number of clusters, a difficult problem in clustering analysis. In particular, we propose using the generalized cross-validation (GCV) based on generalized degrees of freedom (GDF) to select the number of clusters. We use a few simple numerical examples to compare our proposed method with some existing approaches, demonstrating our method's promising performance.

  10. Cluster-based exposure variation analysis

    Science.gov (United States)

    2013-01-01

    Background Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. Methods For this purpose, we simulated a repeated cyclic exposure varying within each cycle between “low” and “high” exposure levels in a “near” or “far” range, and with “low” or “high” velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a “small” or “large” standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity. Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. Results C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate

  11. An analysis of hospital brand mark clusters.

    Science.gov (United States)

    Vollmers, Stacy M; Miller, Darryl W; Kilic, Ozcan

    2010-07-01

    This study analyzed brand mark clusters (i.e., various types of brand marks displayed in combination) used by hospitals in the United States. The brand marks were assessed against several normative criteria for creating brand marks that are memorable and that elicit positive affect. Overall, results show a reasonably high level of adherence to many of these normative criteria. Many of the clusters exhibited pictorial elements that reflected benefits and that were conceptually consistent with the verbal content of the cluster. Also, many clusters featured icons that were balanced and moderately complex. However, only a few contained interactive imagery or taglines communicating benefits.

  12. Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.

    Science.gov (United States)

    Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A

    2018-01-30

    Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results

    Directory of Open Access Journals (Sweden)

    Medvedovic Mario

    2011-01-01

    Full Text Available Abstract Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  14. WebGimm: An integrated web-based platform for cluster analysis, functional analysis, and interactive visualization of results.

    Science.gov (United States)

    Joshi, Vineet K; Freudenberg, Johannes M; Hu, Zhen; Medvedovic, Mario

    2011-01-17

    Cluster analysis methods have been extensively researched, but the adoption of new methods is often hindered by technical barriers in their implementation and use. WebGimm is a free cluster analysis web-service, and an open source general purpose clustering web-server infrastructure designed to facilitate easy deployment of integrated cluster analysis servers based on clustering and functional annotation algorithms implemented in R. Integrated functional analyses and interactive browsing of both, clustering structure and functional annotations provides a complete analytical environment for cluster analysis and interpretation of results. The Java Web Start client-based interface is modeled after the familiar cluster/treeview packages making its use intuitive to a wide array of biomedical researchers. For biomedical researchers, WebGimm provides an avenue to access state of the art clustering procedures. For Bioinformatics methods developers, WebGimm offers a convenient avenue to deploy their newly developed clustering methods. WebGimm server, software and manuals can be freely accessed at http://ClusterAnalysis.org/.

  15. Smartness and Italian Cities. A Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Flavio Boscacci

    2014-05-01

    Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a

  16. Radiobiological analyse based on cell cluster models

    International Nuclear Information System (INIS)

    Lin Hui; Jing Jia; Meng Damin; Xu Yuanying; Xu Liangfeng

    2010-01-01

    The influence of cell cluster dimension on EUD and TCP for targeted radionuclide therapy was studied using the radiobiological method. The radiobiological features of tumor with activity-lack in core were evaluated and analyzed by associating EUD, TCP and SF.The results show that EUD will increase with the increase of tumor dimension under the activity homogeneous distribution. If the extra-cellular activity was taken into consideration, the EUD will increase 47%. Under the activity-lack in tumor center and the requirement of TCP=0.90, the α cross-fire influence of 211 At could make up the maximum(48 μm)3 activity-lack for Nucleus source, but(72 μm)3 for Cytoplasm, Cell Surface, Cell and Voxel sources. In clinic,the physician could prefer the suggested dose of Cell Surface source in case of the future of local tumor control for under-dose. Generally TCP could well exhibit the effect difference between under-dose and due-dose, but not between due-dose and over-dose, which makes TCP more suitable for the therapy plan choice. EUD could well exhibit the difference between different models and activity distributions,which makes it more suitable for the research work. When the user uses EUD to study the influence of activity inhomogeneous distribution, one should keep the consistency of the configuration and volume of the former and the latter models. (authors)

  17. Detecting Clusters in Atom Probe Data with Gaussian Mixture Models.

    Science.gov (United States)

    Zelenty, Jennifer; Dahl, Andrew; Hyde, Jonathan; Smith, George D W; Moody, Michael P

    2017-04-01

    Accurately identifying and extracting clusters from atom probe tomography (APT) reconstructions is extremely challenging, yet critical to many applications. Currently, the most prevalent approach to detect clusters is the maximum separation method, a heuristic that relies heavily upon parameters manually chosen by the user. In this work, a new clustering algorithm, Gaussian mixture model Expectation Maximization Algorithm (GEMA), was developed. GEMA utilizes a Gaussian mixture model to probabilistically distinguish clusters from random fluctuations in the matrix. This machine learning approach maximizes the data likelihood via expectation maximization: given atomic positions, the algorithm learns the position, size, and width of each cluster. A key advantage of GEMA is that atoms are probabilistically assigned to clusters, thus reflecting scientifically meaningful uncertainty regarding atoms located near precipitate/matrix interfaces. GEMA outperforms the maximum separation method in cluster detection accuracy when applied to several realistically simulated data sets. Lastly, GEMA was successfully applied to real APT data.

  18. A mathematical model for the dynamics of clustering

    Science.gov (United States)

    Aeyels, Dirk; De Smet, Filip

    2008-10-01

    The formation of several clusters, arising from attracting forces between nonidentical entities or agents, is a phenomenon observed in diverse fields. Think of people gathered through a mutual interest, swarm behaviour of animals or clustering of oscillators in brain cells. We introduce a dynamic model of mutually attracting agents for which we prove that the long-term behaviour consists of agents organized into several groups or clusters. We have completely characterized the cluster structure (i.e. the number of clusters and their composition) by means of a set of inequalities in the parameters of the model and have identified the intensity of the attraction as a key parameter governing the transition between different cluster structures. The versatility of the model will be illustrated by discussing its relation to the Kuramoto model and by describing how it applies to a system of interconnected water basins.

  19. A user credit assessment model based on clustering ensemble for broadband network new media service supervision

    Science.gov (United States)

    Liu, Fang; Cao, San-xing; Lu, Rui

    2012-04-01

    This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.

  20. The Psychology of Yoga Practitioners: A Cluster Analysis.

    Science.gov (United States)

    Genovese, Jeremy E C; Fondran, Kristine M

    2017-11-01

    Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall -Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.

  1. Cluster analysis for DNA methylation profiles having a detection threshold

    Directory of Open Access Journals (Sweden)

    Siegmund Kimberly D

    2006-07-01

    Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.

  2. Simultaneous Two-Way Clustering of Multiple Correspondence Analysis

    Science.gov (United States)

    Hwang, Heungsun; Dillon, William R.

    2010-01-01

    A 2-way clustering approach to multiple correspondence analysis is proposed to account for cluster-level heterogeneity of both respondents and variable categories in multivariate categorical data. Specifically, in the proposed method, multiple correspondence analysis is combined with k-means in a unified framework in which "k"-means is…

  3. Using Cluster Analysis for Data Mining in Educational Technology Research

    Science.gov (United States)

    Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.

    2012-01-01

    Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…

  4. K­MEANS CLUSTERING FOR HIDDEN MARKOV MODEL

    NARCIS (Netherlands)

    Perrone, M.P.; Connell, S.D.

    2004-01-01

    An unsupervised k­means clustering algorithm for hidden Markov models is described and applied to the task of generating subclass models for individual handwritten character classes. The algorithm is compared to a related clustering method and shown to give a relative change in the error rate of as

  5. Approximate Solutions of Interactive Dynamic Influence Diagrams Using Model Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Doshi, Prashant; Qiongyu, Cheng

    2007-01-01

    of the other agents, which increase exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents' candidate models at each time step to a constant. We do this by clustering the models and selecting a representative set from the clusters...

  6. Analysis of Germination Capacity and Germinant Receptor (Sub)clusters of Genome-Sequenced Bacillus cereus Environmental Isolates and Model Strains.

    Science.gov (United States)

    Warda, Alicja K; Xiao, Yinghua; Boekhorst, Jos; Wells-Bennik, Marjon H J; Nierop Groot, Masja N; Abee, Tjakko

    2017-02-15

    Spore germination of 17 Bacillus cereus food isolates and reference strains was evaluated using flow cytometry analysis in combination with fluorescent staining at a single-spore level. This approach allowed for rapid collection of germination data under more than 20 conditions, including heat activation of spores, germination in complex media (brain heart infusion [BHI] and tryptone soy broth [TSB]), and exposure to saturating concentrations of single amino acids and the combination of alanine and inosine. Whole-genome sequence comparison revealed a total of 11 clusters of operons encoding germinant receptors (GRs): GerK, GerI, and GerL were present in all strains, whereas GerR, GerS, GerG, GerQ, GerX, GerF, GerW, and GerZ (sub)clusters showed a more diverse presence/absence in different strains. The spores of tested strains displayed high diversity with regard to their sensitivity and responsiveness to selected germinants and heat activation. The two laboratory strains, B. cereus ATCC 14579 and ATCC 10987, and 11 food isolates showed a good germination response under a range of conditions, whereas four other strains (B. cereus B4085, B4086, B4116, and B4153) belonging to phylogenetic group IIIA showed a very weak germination response even in BHI and TSB media. Germination responses could not be linked to specific (combinations of) GRs, but it was noted that the four group IIIA strains contained pseudogenes or variants of subunit C in their gerL cluster. Additionally, two of those strains (B4086 and B4153) carried pseudogenes in the gerK and gerR I (sub)clusters that possibly affected the functionality of these GRs. Germination of bacterial spores is a critical step before vegetative growth can resume. Food products may contain nutrient germinants that trigger germination and outgrowth of Bacillus species spores, possibly leading to food spoilage or foodborne illness. Prediction of spore germination behavior is, however, very challenging, especially for spores of

  7. Cluster radioactive decay within the preformed cluster model using relativistic mean-field theory densities

    International Nuclear Information System (INIS)

    Singh, BirBikram; Patra, S. K.; Gupta, Raj K.

    2010-01-01

    We have studied the (ground-state) cluster radioactive decays within the preformed cluster model (PCM) of Gupta and collaborators [R. K. Gupta, in Proceedings of the 5th International Conference on Nuclear Reaction Mechanisms, Varenna, edited by E. Gadioli (Ricerca Scientifica ed Educazione Permanente, Milano, 1988), p. 416; S. S. Malik and R. K. Gupta, Phys. Rev. C 39, 1992 (1989)]. The relativistic mean-field (RMF) theory is used to obtain the nuclear matter densities for the double folding procedure used to construct the cluster-daughter potential with M3Y nucleon-nucleon interaction including exchange effects. Following the PCM approach, we have deduced empirically the preformation probability P 0 emp from the experimental data on both the α- and exotic cluster-decays, specifically of parents in the trans-lead region having doubly magic 208 Pb or its neighboring nuclei as daughters. Interestingly, the RMF-densities-based nuclear potential supports the concept of preformation for both the α and heavier clusters in radioactive nuclei. P 0 α(emp) for α decays is almost constant (∼10 -2 -10 -3 ) for all the parent nuclei considered here, and P 0 c(emp) for cluster decays of the same parents decrease with the size of clusters emitted from different parents. The results obtained for P 0 c(emp) are reasonable and are within two to three orders of magnitude of the well-accepted phenomenological model of Blendowske-Walliser for light clusters.

  8. A novel model-free data analysis technique based on clustering in a mutual information space: application to resting-state fMRI

    Directory of Open Access Journals (Sweden)

    Simon Benjaminsson

    2010-08-01

    Full Text Available Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of Independent Component Analysis (ICA have been the methods mostly used on fMRI data, e.g. in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.

  9. Artificial neural network modeling and cluster analysis for organic facies and burial history estimation using well log data: A case study of the South Pars Gas Field, Persian Gulf, Iran

    Science.gov (United States)

    Alizadeh, Bahram; Najjari, Saeid; Kadkhodaie-Ilkhchi, Ali

    2012-08-01

    Intelligent and statistical techniques were used to extract the hidden organic facies from well log responses in the Giant South Pars Gas Field, Persian Gulf, Iran. Kazhdomi Formation of Mid-Cretaceous and Kangan-Dalan Formations of Permo-Triassic Data were used for this purpose. Initially GR, SGR, CGR, THOR, POTA, NPHI and DT logs were applied to model the relationship between wireline logs and Total Organic Carbon (TOC) content using Artificial Neural Networks (ANN). The correlation coefficient (R2) between the measured and ANN predicted TOC equals to 89%. The performance of the model is measured by the Mean Squared Error function, which does not exceed 0.0073. Using Cluster Analysis technique and creating a binary hierarchical cluster tree the constructed TOC column of each formation was clustered into 5 organic facies according to their geochemical similarity. Later a second model with the accuracy of 84% was created by ANN to determine the specified clusters (facies) directly from well logs for quick cluster recognition in other wells of the studied field. Each created facies was correlated to its appropriate burial history curve. Hence each and every facies of a formation could be scrutinized separately and directly from its well logs, demonstrating the time and depth of oil or gas generation. Therefore potential production zone of Kazhdomi probable source rock and Kangan- Dalan reservoir formation could be identified while well logging operations (especially in LWD cases) were in progress. This could reduce uncertainty and save plenty of time and cost for oil industries and aid in the successful implementation of exploration and exploitation plans.

  10. Cluster analysis of activity-time series in motor learning

    DEFF Research Database (Denmark)

    Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A

    2002-01-01

    Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...

  11. Visual cluster analysis and pattern recognition methods

    Science.gov (United States)

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    2001-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  12. Genetic analysis of loose cluster architecture in grapevine

    Directory of Open Access Journals (Sweden)

    Richter Robert

    2017-01-01

    Full Text Available Loose cluster architecture is a well known trait supporting Botrytis resilience by permitting a faster drying of bunches. Furthermore, a loose bunch enables a better application of fungicides into the cluster. The analysis of 150 F1 plants of the superior breeding line GF.GA-47-42 (‘Bacchus' x ‘Seyval blanc' crossed with ‘Villard blanc' segregating for compactness of the cluster was used for QTL analysis. Plenty of QTL were identified reproducibly for two years, QTLs stable over three growing seasons were identified for rachis length, peduncle length, and pedicel length. In a second approach ‘Pinot noir' clones showing variation for cluster architecture were analyzed for differential gene expression. Grown in three different German viticultural areas, loose versus compact clustered ‘Pinot noir' clones showed in gene expression experiments a candidate gene expressed fivefold higher in loosely clustered clones between stages BBCH57 and BBCH71.

  13. Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.

    Science.gov (United States)

    Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun

    2017-01-01

    Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.

  14. EM Clustering Analysis of Diabetes Patients Basic Diagnosis Index

    OpenAIRE

    Wu, Cai; Steinbauer, Jeffrey R.; Kuo, Grace M.

    2005-01-01

    Cluster analysis can group similar instances into same group. Partitioning cluster assigns classes to samples without known the classes in advance. Most common algorithms are K-means and Expectation Maximization (EM). EM clustering algorithm can find number of distributions of generating data and build “mixture models”. It identifies groups that are either overlapping or varying sizes and shapes. In this project, by using EM in Machine Learning Algorithm in JAVA (WEKA) syste...

  15. The smart cluster method. Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-07-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  16. Automatic Prosodic Segmentation by F0 Clustering Using Superpositional Modeling.

    OpenAIRE

    Nakai, Mitsuru; Harald, Singer; Sagisaka, Yoshinori; Shimodaira, Hiroshi

    1995-01-01

    In this paper, we propose an automatic method for detecting accent phrase boundaries in Japanese continuous speech by using F0 information. In the training phase, hand labeled accent patterns are parameterized according to a superpositional model proposed by Fujisaki, and assigned to some clusters by a clustering method, in which accent templates are calculated as centroid of each cluster. In the segmentation phase, automatic N-best extraction of boundaries is performe...

  17. Allergen Sensitization Pattern by Sex: A Cluster Analysis in Korea.

    Science.gov (United States)

    Ohn, Jungyoon; Paik, Seung Hwan; Doh, Eun Jin; Park, Hyun-Sun; Yoon, Hyun-Sun; Cho, Soyun

    2017-12-01

    Allergens tend to sensitize simultaneously. Etiology of this phenomenon has been suggested to be allergen cross-reactivity or concurrent exposure. However, little is known about specific allergen sensitization patterns. To investigate the allergen sensitization characteristics according to gender. Multiple allergen simultaneous test (MAST) is widely used as a screening tool for detecting allergen sensitization in dermatologic clinics. We retrospectively reviewed the medical records of patients with MAST results between 2008 and 2014 in our Department of Dermatology. A cluster analysis was performed to elucidate the allergen-specific immunoglobulin (Ig)E cluster pattern. The results of MAST (39 allergen-specific IgEs) from 4,360 cases were analyzed. By cluster analysis, 39items were grouped into 8 clusters. Each cluster had characteristic features. When compared with female, the male group tended to be sensitized more frequently to all tested allergens, except for fungus allergens cluster. The cluster and comparative analysis results demonstrate that the allergen sensitization is clustered, manifesting allergen similarity or co-exposure. Only the fungus cluster allergens tend to sensitize female group more frequently than male group.

  18. Higgs Pair Production: Choosing Benchmarks With Cluster Analysis

    CERN Document Server

    Carvalho, Alexandra; Dorigo, Tommaso; Goertz, Florian; Gottardo, Carlo A.; Tosi, Mia

    2016-01-01

    New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...

  19. Entropic Approach to Multiscale Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Antonio Insolia

    2012-05-01

    Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.

  20. Clustering disaggregated load profiles using a Dirichlet process mixture model

    International Nuclear Information System (INIS)

    Granell, Ramon; Axon, Colin J.; Wallom, David C.H.

    2015-01-01

    Highlights: • We show that the Dirichlet process mixture model is scaleable. • Our model does not require the number of clusters as an input. • Our model creates clusters only by the features of the demand profiles. • We have used both residential and commercial data sets. - Abstract: The increasing availability of substantial quantities of power-use data in both the residential and commercial sectors raises the possibility of mining the data to the advantage of both consumers and network operations. We present a Bayesian non-parametric model to cluster load profiles from households and business premises. Evaluators show that our model performs as well as other popular clustering methods, but unlike most other methods it does not require the number of clusters to be predetermined by the user. We used the so-called ‘Chinese restaurant process’ method to solve the model, making use of the Dirichlet-multinomial distribution. The number of clusters grew logarithmically with the quantity of data, making the technique suitable for scaling to large data sets. We were able to show that the model could distinguish features such as the nationality, household size, and type of dwelling between the cluster memberships

  1. Marketing Mix Formulation for Higher Education: An Integrated Analysis Employing Analytic Hierarchy Process, Cluster Analysis and Correspondence Analysis

    Science.gov (United States)

    Ho, Hsuan-Fu; Hung, Chia-Chi

    2008-01-01

    Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…

  2. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  3. The Reliability of Inverse Screen Tests for Cluster Analysis.

    Science.gov (United States)

    Lathrop, Richard G.; Williams, Janice E.

    1987-01-01

    A Monte Carlo study, involving 6,000 "computer subjects" and three raters, explored the reliability of the inverse screen test for cluster analysis. Results indicate that the inverse screen may be a useful and reliable cluster analytic technique for determining the number of true groups. (TJH)

  4. Blaeu: Mapping and navigating large tables with cluster analysis

    NARCIS (Netherlands)

    T.H.J. Sellam (Thibault); C.P. Cijvat (Robin); R.A. Koopmanschap (Richard); M.L. Kersten (Martin)

    2016-01-01

    textabstractBlaeu is an interactive database exploration tool. Its aim is to guide casual users through large data tables, ultimately triggering insights and serendipity. To do so, it relies on a double cluster analysis mechanism. It clusters the data vertically: it detects themes, groups of

  5. Ab initio calculations and modelling of atomic cluster structure

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Lyalin, Andrey G.; Solov'yov, Andrey V.

    2004-01-01

    framework for modelling the fusion process of noble gas clusters is presented. We report the striking correspondence of the peaks in the experimentally measured abundance mass spectra with the peaks in the size-dependence of the second derivative of the binding energy per atom calculated for the chain...... of the noble gas clusters up to 150 atoms....

  6. Fitting Latent Cluster Models for Networks with latentnet

    Directory of Open Access Journals (Sweden)

    Pavel N. Krivitsky

    2007-12-01

    Full Text Available latentnet is a package to fit and evaluate statistical latent position and cluster models for networks. Hoff, Raftery, and Handcock (2002 suggested an approach to modeling networks based on positing the existence of an latent space of characteristics of the actors. Relationships form as a function of distances between these characteristics as well as functions of observed dyadic level covariates. In latentnet social distances are represented in a Euclidean space. It also includes a variant of the extension of the latent position model to allow for clustering of the positions developed in Handcock, Raftery, and Tantrum (2007.The package implements Bayesian inference for the models based on an Markov chain Monte Carlo algorithm. It can also compute maximum likelihood estimates for the latent position model and a two-stage maximum likelihood method for the latent position cluster model. For latent position cluster models, the package provides a Bayesian way of assessing how many groups there are, and thus whether or not there is any clustering (since if the preferred number of groups is 1, there is little evidence for clustering. It also estimates which cluster each actor belongs to. These estimates are probabilistic, and provide the probability of each actor belonging to each cluster. It computes four types of point estimates for the coefficients and positions: maximum likelihood estimate, posterior mean, posterior mode and the estimator which minimizes Kullback-Leibler divergence from the posterior. You can assess the goodness-of-fit of the model via posterior predictive checks. It has a function to simulate networks from a latent position or latent position cluster model.

  7. Participant intimacy: A cluster analysis of the intranuclear cascade

    International Nuclear Information System (INIS)

    Cugnon, J.; Knoll, J.; Randrup, J.

    1981-01-01

    The intranuclear cascade for relativistic nuclear collisions is analyzed in terms of clusters consisting of groups of nucleons which are dynamically linked to each other by violent interactions. The formation cross sections for the different cluster types as well as their intrinsic dynamics are studied and compared with the predictions of the linear cascade model ( rows-on-rows ). (orig.)

  8. Participant intimacy A cluster analysis of the intranuclear cascadet

    Science.gov (United States)

    Cugnon, J.; Knoll, J.; Randrup, J.

    1981-05-01

    The intranuclear cascade for relativistic nuclear collisions is analyzed in terms of "clusters" consisting of groups of nucleons which are dynamically linked to each other by violent interactions. The formation cross sections for the different cluster types as well as their intrinsic dynamics are studied and compared with the predictions of the linear cascade model ("rows-on-rows").

  9. Merging Galaxy Clusters: Analysis of Simulated Analogs

    Science.gov (United States)

    Nguyen, Jayke; Wittman, David; Cornell, Hunter

    2018-01-01

    The nature of dark matter can be better constrained by observing merging galaxy clusters. However, uncertainty in the viewing angle leads to uncertainty in dynamical quantities such as 3-d velocities, 3-d separations, and time since pericenter. The classic timing argument links these quantities via equations of motion, but neglects effects of nonzero impact parameter (i.e. it assumes velocities are parallel to the separation vector), dynamical friction, substructure, and larger-scale environment. We present a new approach using n-body cosmological simulations that naturally incorporate these effects. By uniformly sampling viewing angles about simulated cluster analogs, we see projected merger parameters in the many possible configurations of a given cluster. We select comparable simulated analogs and evaluate the likelihood of particular merger parameters as a function of viewing angle. We present viewing angle constraints for a sample of observed mergers including the Bullet cluster and El Gordo, and show that the separation vectors are closer to the plane of the sky than previously reported.

  10. Genotypic stability and clustering analysis of confectionery ...

    African Journals Online (AJOL)

    Nine groundnut genotypes were evaluated in terminal moisture-stress areas of northeastern Ethiopia during 2005 and 2006 cropping seasons with the objective of analyzing genotypic stability and clustering of confectionery groundnut for seed and protein yield. The genotypes were evaluated on a plot size of 15 m2 at Kobo ...

  11. Prognostic value of cluster analysis of severe asthma phenotypes.

    Science.gov (United States)

    Bourdin, Arnaud; Molinari, Nicolas; Vachier, Isabelle; Varrin, Muriel; Marin, Grégory; Gamez, Anne-Sophie; Paganin, Fabrice; Chanez, Pascal

    2014-11-01

    Cross-sectional severe asthma cluster analysis identified different phenotypes. We tested the hypothesis that these clusters will follow different courses. We aimed to identify which asthma outcomes are specific and coherently associated with these different phenotypes in a prospective longitudinal cohort. In a longitudinal cohort of 112 patients with severe asthma, the 5 Severe Asthma Research Program (SARP) clusters were identified by means of algorithm application. Because patients of the present cohort all had severe asthma compared with the SARP cohort, homemade clusters were identified and also tested. At the subsequent visit, we investigated several outcomes related to asthma control at 1 year (6-item Asthma Control Questionnaire [ACQ-6], lung function, and medication requirement) and then recorded the 3-year exacerbations rate and time to first exacerbation. The SARP algorithm discriminated the 5 clusters at entry for age, asthma duration, lung function, blood eosinophil measurement, ACQ-6 scores, and diabetes comorbidity. Four homemade clusters were mostly segregated by best ever achieved FEV1 values and discriminated the groups by a few clinical characteristics. Nonetheless, all these clusters shared similar asthma outcomes related to asthma control as follows. The ACQ-6 score did not change in any cluster. Exacerbation rate and time to first exacerbation were similar, as were treatment requirements. Severe asthma phenotypes identified by using a previously reported cluster analysis or newly homemade clusters do not behave differently concerning asthma control-related outcomes, which are used to assess the response to innovative therapies. This study demonstrates a potential limitation of the cluster analysis approach in the field of severe asthma. Copyright © 2014. Published by Elsevier Inc.

  12. Alloy design as an inverse problem of cluster expansion models

    DEFF Research Database (Denmark)

    Larsen, Peter Mahler; Kalidindi, Arvind R.; Schmidt, Søren

    2017-01-01

    Central to a lattice model of an alloy system is the description of the energy of a given atomic configuration, which can be conveniently developed through a cluster expansion. Given a specific cluster expansion, the ground state of the lattice model at 0 K can be solved by finding the configurat......Central to a lattice model of an alloy system is the description of the energy of a given atomic configuration, which can be conveniently developed through a cluster expansion. Given a specific cluster expansion, the ground state of the lattice model at 0 K can be solved by finding...... the inverse problem in terms of energetically distinct configurations, using a constraint satisfaction model to identify constructible configurations, and show that a convex hull can be used to identify ground states. To demonstrate the approach, we solve for all ground states for a binary alloy in a 2D...

  13. COCOA code for creating mock observations of star cluster models

    Science.gov (United States)

    Askar, Abbas; Giersz, Mirek; Pych, Wojciech; Dalessandro, Emanuele

    2018-04-01

    We introduce and present results from the COCOA (Cluster simulatiOn Comparison with ObservAtions) code that has been developed to create idealized mock photometric observations using results from numerical simulations of star cluster evolution. COCOA is able to present the output of realistic numerical simulations of star clusters carried out using Monte Carlo or N-body codes in a way that is useful for direct comparison with photometric observations. In this paper, we describe the COCOA code and demonstrate its different applications by utilizing globular cluster (GC) models simulated with the MOCCA (MOnte Carlo Cluster simulAtor) code. COCOA is used to synthetically observe these different GC models with optical telescopes, perform point spread function photometry, and subsequently produce observed colour-magnitude diagrams. We also use COCOA to compare the results from synthetic observations of a cluster model that has the same age and metallicity as the Galactic GC NGC 2808 with observations of the same cluster carried out with a 2.2 m optical telescope. We find that COCOA can effectively simulate realistic observations and recover photometric data. COCOA has numerous scientific applications that maybe be helpful for both theoreticians and observers that work on star clusters. Plans for further improving and developing the code are also discussed in this paper.

  14. Statistical cluster analysis of the British Thoracic Society Severe refractory Asthma Registry: clinical outcomes and phenotype stability.

    Directory of Open Access Journals (Sweden)

    Chris Newby

    Full Text Available Severe refractory asthma is a heterogeneous disease. We sought to determine statistical clusters from the British Thoracic Society Severe refractory Asthma Registry and to examine cluster-specific outcomes and stability.Factor analysis and statistical cluster modelling was undertaken to determine the number of clusters and their membership (N = 349. Cluster-specific outcomes were assessed after a median follow-up of 3 years. A classifier was programmed to determine cluster stability and was validated in an independent cohort of new patients recruited to the registry (n = 245.Five clusters were identified. Cluster 1 (34% were atopic with early onset disease, cluster 2 (21% were obese with late onset disease, cluster 3 (15% had the least severe disease, cluster 4 (15% were the eosinophilic with late onset disease and cluster 5 (15% had significant fixed airflow obstruction. At follow-up, the proportion of subjects treated with oral corticosteroids increased in all groups with an increase in body mass index. Exacerbation frequency decreased significantly in clusters 1, 2 and 4 and was associated with a significant fall in the peripheral blood eosinophil count in clusters 2 and 4. Stability of cluster membership at follow-up was 52% for the whole group with stability being best in cluster 2 (71% and worst in cluster 4 (25%. In an independent validation cohort, the classifier identified the same 5 clusters with similar patient distribution and characteristics.Statistical cluster analysis can identify distinct phenotypes with specific outcomes. Cluster membership can be determined using a classifier, but when treatment is optimised, cluster stability is poor.

  15. Assessment of Random Assignment in Training and Test Sets using Generalized Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Sorana D. BOLBOACĂ

    2011-06-01

    Full Text Available Aim: The properness of random assignment of compounds in training and validation sets was assessed using the generalized cluster technique. Material and Method: A quantitative Structure-Activity Relationship model using Molecular Descriptors Family on Vertices was evaluated in terms of assignment of carboquinone derivatives in training and test sets during the leave-many-out analysis. Assignment of compounds was investigated using five variables: observed anticancer activity and four structure descriptors. Generalized cluster analysis with K-means algorithm was applied in order to investigate if the assignment of compounds was or not proper. The Euclidian distance and maximization of the initial distance using a cross-validation with a v-fold of 10 was applied. Results: All five variables included in analysis proved to have statistically significant contribution in identification of clusters. Three clusters were identified, each of them containing both carboquinone derivatives belonging to training as well as to test sets. The observed activity of carboquinone derivatives proved to be normal distributed on every. The presence of training and test sets in all clusters identified using generalized cluster analysis with K-means algorithm and the distribution of observed activity within clusters sustain a proper assignment of compounds in training and test set. Conclusion: Generalized cluster analysis using the K-means algorithm proved to be a valid method in assessment of random assignment of carboquinone derivatives in training and test sets.

  16. Describing the homeless mentally ill: cluster analysis results.

    Science.gov (United States)

    Mowbray, C T; Bybee, D; Cohen, E

    1993-02-01

    Presented descriptive data on a group of homeless, mentally ill individuals (N = 108) served by a two-site demonstration project, funded by NIMH. Comparing results with those from other studies of this population produced some differences and some similarities. Cluster analysis techniques were applied to the data, producing a 4-group solution. Data validating the cluster solution are presented. It is suggested that the cluster results provide a more meaningful and useful method of understanding the descriptive data. Results suggest that while the population of individuals served as homeless and mentally ill is quite heterogeneous, many have well-developed functioning skills--only one cluster, making up 35.2% of the sample, fits the stereotype of the aggressive, psychotic individual with skill deficits in many areas. Further discussion is presented concerning the implications of the cluster analysis results for demonstrating contextual effects and thus better interpreting research results from other studies and assisting in future services planning.

  17. Symptom Cluster Research With Biomarkers and Genetics Using Latent Class Analysis.

    Science.gov (United States)

    Conley, Samantha

    2017-12-01

    The purpose of this article is to provide an overview of latent class analysis (LCA) and examples from symptom cluster research that includes biomarkers and genetics. A review of LCA with genetics and biomarkers was conducted using Medline, Embase, PubMed, and Google Scholar. LCA is a robust latent variable model used to cluster categorical data and allows for the determination of empirically determined symptom clusters. Researchers should consider using LCA to link empirically determined symptom clusters to biomarkers and genetics to better understand the underlying etiology of symptom clusters. The full potential of LCA in symptom cluster research has not yet been realized because it has been used in limited populations, and researchers have explored limited biologic pathways.

  18. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  19. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Science.gov (United States)

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  20. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Science.gov (United States)

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  1. Reproducibility of Cognitive Profiles in Psychosis Using Cluster Analysis.

    Science.gov (United States)

    Lewandowski, Kathryn E; Baker, Justin T; McCarthy, Julie M; Norris, Lesley A; Öngür, Dost

    2018-04-01

    Cognitive dysfunction is a core symptom dimension that cuts across the psychoses. Recent findings support classification of patients along the cognitive dimension using cluster analysis; however, data-derived groupings may be highly determined by sampling characteristics and the measures used to derive the clusters, and so their interpretability must be established. We examined cognitive clusters in a cross-diagnostic sample of patients with psychosis and associations with clinical and functional outcomes. We then compared our findings to a previous report of cognitive clusters in a separate sample using a different cognitive battery. Participants with affective or non-affective psychosis (n=120) and healthy controls (n=31) were administered the MATRICS Consensus Cognitive Battery, and clinical and community functioning assessments. Cluster analyses were performed on cognitive variables, and clusters were compared on demographic, cognitive, and clinical measures. Results were compared to findings from our previous report. A four-cluster solution provided a good fit to the data; profiles included a neuropsychologically normal cluster, a globally impaired cluster, and two clusters of mixed profiles. Cognitive burden was associated with symptom severity and poorer community functioning. The patterns of cognitive performance by cluster were highly consistent with our previous findings. We found evidence of four cognitive subgroups of patients with psychosis, with cognitive profiles that map closely to those produced in our previous work. Clusters were associated with clinical and community variables and a measure of premorbid functioning, suggesting that they reflect meaningful groupings: replicable, and related to clinical presentation and functional outcomes. (JINS, 2018, 24, 382-390).

  2. X-Ray Morphological Analysis of the Planck ESZ Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Andrade-Santos, Felipe; Randall, Scott; Kraft, Ralph [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street, Cambridge, MA 02138 (United States); Ettori, Stefano [INAF, Osservatorio Astronomico di Bologna, via Ranzani 1, I-40127 Bologna (Italy); Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W. [Laboratoire AIM, IRFU/Service d’Astrophysique—CEA/DRF—CNRS—Université Paris Diderot, Bât. 709, CEA-Saclay, F-91191 Gif-sur-Yvette Cedex (France)

    2017-09-01

    X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton . We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.

  3. Modeling of correlated data with informative cluster sizes: An evaluation of joint modeling and within-cluster resampling approaches.

    Science.gov (United States)

    Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S

    2017-08-01

    Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.

  4. Old star clusters: Bench tests of low mass stellar models

    Directory of Open Access Journals (Sweden)

    Salaris M.

    2013-03-01

    Full Text Available Old star clusters in the Milky Way and external galaxies have been (and still are traditionally used to constrain the age of the universe and the timescales of galaxy formation. A parallel avenue of old star cluster research considers these objects as bench tests of low-mass stellar models. This short review will highlight some recent tests of stellar evolution models that make use of photometric and spectroscopic observations of resolved old star clusters. In some cases these tests have pointed to additional physical processes efficient in low-mass stars, that are not routinely included in model computations. Moreover, recent results from the Kepler mission about the old open cluster NGC6791 are adding new tight constraints to the models.

  5. Time series clustering analysis of health-promoting behavior

    Science.gov (United States)

    Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng

    2013-10-01

    Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.

  6. Identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard.

    Directory of Open Access Journals (Sweden)

    Xiao-Juan Jiang

    Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles

  7. Vertex finding by sparse model-based clustering

    Science.gov (United States)

    Frühwirth, R.; Eckstein, K.; Frühwirth-Schnatter, S.

    2016-10-01

    The application of sparse model-based clustering to the problem of primary vertex finding is discussed. The observed z-positions of the charged primary tracks in a bunch crossing are modeled by a Gaussian mixture. The mixture parameters are estimated via Markov Chain Monte Carlo (MCMC). Sparsity is achieved by an appropriate prior on the mixture weights. The results are shown and compared to clustering by the expectation-maximization (EM) algorithm.

  8. Cluster analysis of typhoid cases in Kota Bharu, Kelantan, Malaysia

    Directory of Open Access Journals (Sweden)

    Nazarudin Safian

    2008-09-01

    Full Text Available Typhoid fever is still a major public health problem globally as well as in Malaysia. This study was done to identify the spatial epidemiology of typhoid fever in the Kota Bharu District of Malaysia as a first step to developing more advanced analysis of the whole country. The main characteristic of the epidemiological pattern that interested us was whether typhoid cases occurred in clusters or whether they were evenly distributed throughout the area. We also wanted to know at what spatial distances they were clustered. All confirmed typhoid cases that were reported to the Kota Bharu District Health Department from the year 2001 to June of 2005 were taken as the samples. From the home address of the cases, the location of the house was traced and a coordinate was taken using handheld GPS devices. Spatial statistical analysis was done to determine the distribution of typhoid cases, whether clustered, random or dispersed. The spatial statistical analysis was done using CrimeStat III software to determine whether typhoid cases occur in clusters, and later on to determine at what distances it clustered. From 736 cases involved in the study there was significant clustering for cases occurring in the years 2001, 2002, 2003 and 2005. There was no significant clustering in year 2004. Typhoid clustering also occurred strongly for distances up to 6 km. This study shows that typhoid cases occur in clusters, and this method could be applicable to describe spatial epidemiology for a specific area. (Med J Indones 2008; 17: 175-82Keywords: typhoid, clustering, spatial epidemiology, GIS

  9. Global classification of human facial healthy skin using PLS discriminant analysis and clustering analysis.

    Science.gov (United States)

    Guinot, C; Latreille, J; Tenenhaus, M; Malvy, D J

    2001-04-01

    Today's classifications of healthy skin are predominantly based on a very limited number of skin characteristics, such as skin oiliness or susceptibility to sun exposure. The aim of the present analysis was to set up a global classification of healthy facial skin, using mathematical models. This classification is based on clinical, biophysical skin characteristics and self-reported information related to the skin, as well as the results of a theoretical skin classification assessed separately for the frontal and the malar zones of the face. In order to maximize the predictive power of the models with a minimum of variables, the Partial Least Square (PLS) discriminant analysis method was used. The resulting PLS components were subjected to clustering analyses to identify the plausible number of clusters and to group the individuals according to their proximities. Using this approach, four PLS components could be constructed and six clusters were found relevant. So, from the 36 hypothetical combinations of the theoretical skin types classification, we tended to a strengthened six classes proposal. Our data suggest that the association of the PLS discriminant analysis and the clustering methods leads to a valid and simple way to classify healthy human skin and represents a potentially useful tool for cosmetic and dermatological research.

  10. Effects of Group Size and Lack of Sphericity on the Recovery of Clusters in K-Means Cluster Analysis

    Science.gov (United States)

    de Craen, Saskia; Commandeur, Jacques J. F.; Frank, Laurence E.; Heiser, Willem J.

    2006-01-01

    K-means cluster analysis is known for its tendency to produce spherical and equally sized clusters. To assess the magnitude of these effects, a simulation study was conducted, in which populations were created with varying departures from sphericity and group sizes. An analysis of the recovery of clusters in the samples taken from these…

  11. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Science.gov (United States)

    Tokuda, Tomoki; Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  12. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions.

    Directory of Open Access Journals (Sweden)

    Tomoki Tokuda

    Full Text Available We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.

  13. Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

    Science.gov (United States)

    Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392

  14. Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Rita Ismayilova

    2014-01-01

    Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.

  15. Clustering applications in financial and economic analysis of the crop production in the Russian regions

    Directory of Open Access Journals (Sweden)

    Gromov Vladislav Vladimirovich

    2013-08-01

    Full Text Available We used the complex mathematical modeling, multivariate statistical-analysis, fuzzy sets to analyze the financial and economic state of the crop production in Russian regions. We developed a system of indicators, detecting the state agricultural sector in the region, based on the results of correlation, factor, cluster analysis and statistics of the Federal State Statistics Service. We performed clustering analyses to divide regions of Russia on selected factors into five groups. A qualitative and quantitative characteristics of each cluster was received.

  16. Comparative analysis of genomic signal processing for microarray data clustering.

    Science.gov (United States)

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  17. Solvable random-decimation model of cluster scaling

    Science.gov (United States)

    Fraser, Simon J.

    1988-07-01

    A percolation model of critical-cluster scaling is studied. The model allows the generation of configurations of strongly self-similar clusters by stochastic decimation on a tree. Tree traversal is controlled by a probability parameter p. At p=0 or 1, the configuration is deterministic, but, for 0decimation algorithm uses the Sierpinski carpet and Vicsek snowflake generators, so that the treelike character (connectedness) of the clusters can be changed continuously. Various dimensions of the (fractal) percolation cluster are calculated using boundary conditions that give correct values at the deterministic limits. The usual cluster distribution law, ns~s-τ with τ=d/D+1, is obeyed for stationary p in (0,1), although τ=d/D, the deterministic value at p=0 or 1. Here d is the space dimension, and D the fractal dimension of the percolation cluster. The sensitivity of τ to changes in p near p=0 or 1 allows anomalous cluster scaling, so that τ may be fixed between d/D and d/D+1, without affecting D. Possible applications of the model are discussed.

  18. Autoregressive Model Using Fuzzy C-Regression Model Clustering for Traffic Modeling

    Science.gov (United States)

    Tanaka, Fumiaki; Suzuki, Yukinori; Maeda, Junji

    A robust traffic modeling is required to perform an effective congestion control for the broad band digital network. An autoregressive model using a fuzzy c-regression model (FCRM) clustering is proposed for a traffic modeling. This is a simpler modeling method than previous methods. The experiments show that the proposed method is more robust for traffic modeling than the previous method.

  19. A Distributed Flocking Approach for Information Stream Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.

  20. Modeling the formation of globular cluster systems in the Virgo cluster

    International Nuclear Information System (INIS)

    Li, Hui; Gnedin, Oleg Y.

    2014-01-01

    The mass distribution and chemical composition of globular cluster (GC) systems preserve fossil record of the early stages of galaxy formation. The observed distribution of GC colors within massive early-type galaxies in the ACS Virgo Cluster Survey (ACSVCS) reveals a multi-modal shape, which likely corresponds to a multi-modal metallicity distribution. We present a simple model for the formation and disruption of GCs that aims to match the ACSVCS data. This model tests the hypothesis that GCs are formed during major mergers of gas-rich galaxies and inherit the metallicity of their hosts. To trace merger events, we use halo merger trees extracted from a large cosmological N-body simulation. We select 20 halos in the mass range of 2 × 10 12 to 7 × 10 13 M ☉ and match them to 19 Virgo galaxies with K-band luminosity between 3 × 10 10 and 3 × 10 11 L ☉ . To set the [Fe/H] abundances, we use an empirical galaxy mass-metallicity relation. We find that a minimal merger ratio of 1:3 best matches the observed cluster metallicity distribution. A characteristic bimodal shape appears because metal-rich GCs are produced by late mergers between massive halos, while metal-poor GCs are produced by collective merger activities of less massive hosts at early times. The model outcome is robust to alternative prescriptions for cluster formation rate throughout cosmic time, but a gradual evolution of the mass-metallicity relation with redshift appears to be necessary to match the observed cluster metallicities. We also affirm the age-metallicity relation, predicted by an earlier model, in which metal-rich clusters are systematically several billion younger than their metal-poor counterparts.

  1. Cluster analysis of clinical data identifies fibromyalgia subgroups.

    Directory of Open Access Journals (Sweden)

    Elisa Docampo

    Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.

  2. Unsupervised ship trajectory modeling and prediction using compression and clustering

    NARCIS (Netherlands)

    de Vries, G.; van Someren, M.; van Erp, M.; Stehouwer, H.; van Zaanen, M.

    2009-01-01

    In this paper we show how to build a model of ship trajectories in a certain maritime region and use this model to predict future ship movements. The presented method is unsupervised and based on existing compression (line-simplification) and clustering techniques. We evaluate the model with a

  3. How Black Holes Shape Globular Clusters: Modeling NGC 3201

    Science.gov (United States)

    Kremer, Kyle; Ye, Claire S.; Chatterjee, Sourav; Rodriguez, Carl L.; Rasio, Frederic A.

    2018-03-01

    Numerical simulations have shown that black holes (BHs) can strongly influence the evolution and present-day observational properties of globular clusters (GCs). Using a Monte Carlo code, we construct GC models that match the Milky Way cluster NGC 3201, the first cluster in which a stellar-mass BH was identified through radial velocity measurements. We predict that NGC 3201 contains ≳200 stellar-mass BHs. Furthermore, we explore the dynamical formation of main-sequence–BH binaries and demonstrate that systems similar to the observed BH binary in NGC 3201 are produced naturally. Additionally, our models predict the existence of bright blue straggler–BH binaries that are unique to core-collapsed clusters, which otherwise retain few BHs.

  4. Quantitative properties of clustering within modern microscopic nuclear models

    International Nuclear Information System (INIS)

    Volya, A.; Tchuvil’sky, Yu. M.

    2016-01-01

    A method for studying cluster spectroscopic properties of nuclear fragmentation, such as spectroscopic amplitudes, cluster form factors, and spectroscopic factors, is developed on the basis of modern precision nuclear models that take into account the mixing of large-scale shell-model configurations. Alpha-cluster channels are considered as an example. A mathematical proof of the need for taking into account the channel-wave-function renormalization generated by exchange terms of the antisymmetrization operator (Fliessbach effect) is given. Examples where this effect is confirmed by a high quality of the description of experimental data are presented. By and large, the method in question extends substantially the possibilities for studying clustering phenomena in nuclei and for improving the quality of their description.

  5. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  6. Development of small scale cluster computer for numerical analysis

    Science.gov (United States)

    Zulkifli, N. H. N.; Sapit, A.; Mohammed, A. N.

    2017-09-01

    In this study, two units of personal computer were successfully networked together to form a small scale cluster. Each of the processor involved are multicore processor which has four cores in it, thus made this cluster to have eight processors. Here, the cluster incorporate Ubuntu 14.04 LINUX environment with MPI implementation (MPICH2). Two main tests were conducted in order to test the cluster, which is communication test and performance test. The communication test was done to make sure that the computers are able to pass the required information without any problem and were done by using simple MPI Hello Program where the program written in C language. Additional, performance test was also done to prove that this cluster calculation performance is much better than single CPU computer. In this performance test, four tests were done by running the same code by using single node, 2 processors, 4 processors, and 8 processors. The result shows that with additional processors, the time required to solve the problem decrease. Time required for the calculation shorten to half when we double the processors. To conclude, we successfully develop a small scale cluster computer using common hardware which capable of higher computing power when compare to single CPU processor, and this can be beneficial for research that require high computing power especially numerical analysis such as finite element analysis, computational fluid dynamics, and computational physics analysis.

  7. Fuzzy clustering analysis to study geomagnetic coastal effects

    Directory of Open Access Journals (Sweden)

    M. Sridharan

    2005-06-01

    Full Text Available The utility of fuzzy set theory in cluster analysis and pattern recognition has been evolving since the mid 1960s, in conjunction with the emergence and evolution of computer technology. The classification of objects into categories is the subject of cluster analysis. The aim of this paper is to employ Fuzzy-clustering technique to examine the interrelationship of geomagnetic coastal and other effects at Indian observatories. Data from the observatories used for the present studies are from Alibag on the West Coast, Visakhapatnam and Pondicherry on the East Coast, Hyderabad and Nagpur as central inland stations which are located far from either of the coasts; all the above stations are free from the influence of the daytime equatorial electrojet. It has been found that Alibag and Pondicherry Observatories form a separate cluster showing anomalous variations in the vertical (Z-component. H- and D-components form different clusters. The results are compared with the graphical method. Analytical technique and the results of Fuzzy-clustering analysis are discussed here.

  8. Electromagnetic properties of 6Li in a cluster model with breathing clusters

    International Nuclear Information System (INIS)

    Kruppa, A.T.; Beck, R.; Dickmann, F.

    1987-01-01

    Electromagnetic properties of 6 Li are studied using a microscopic (α+δ) cluster model. In addition to the ground state of the clusters, their breathing excited states are included in the wave function in order to take into account the distortion of the clusters. The elastic charge form factor is in good agreement with experiment up to a momentum transfer of 8 fm -2 . The ground state magnetic form factor and the inelastic charge form factor are also well described. The effect of the breathing states of α on the form factors proves to be negligible except at high momentum transfer. The ground-state charge density, rms charge radius, the magnetic dipole moment and a reduced transition strength are also obtained in fair agreement with experiment. (author)

  9. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona

    Science.gov (United States)

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...

  10. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona [Abstract

    Science.gov (United States)

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...

  11. Cluster Analytical Method of Fault Risk Analysis in Systems

    Science.gov (United States)

    Michaľčonok, German; Horalová Kalinová, Michaela

    2016-12-01

    In providing safety functions, the proposal of safety functions of control systems is an important part of a risk reduction strategy. In the specification of security requirements, it is necessary to determine and document individual characteristics and the desired performance level for each safety. This article presents the results of the experiment cluster analysis. The results of the experiment prove that the methods of cluster analysis provide a suitable tool for analyzing the reliability of safety systems analysis. Regarding the increasing complexity of the systems, we can state that the application of these methods in the subject area is a good choice.

  12. GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING

    Directory of Open Access Journals (Sweden)

    Christopher Ouma Onyango

    2010-09-01

    Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.

  13. Molecular dynamics modelling of EGCG clusters on ceramide bilayers

    Energy Technology Data Exchange (ETDEWEB)

    Yeo, Jingjie; Cheng, Yuan; Li, Weifeng; Zhang, Yong-Wei [Institute of High Performance Computing, A*STAR, 138632 (Singapore)

    2015-12-31

    A novel method of atomistic modelling and characterization of both pure ceramide and mixed lipid bilayers is being developed, using only the General Amber ForceField. Lipid bilayers modelled as pure ceramides adopt hexagonal packing after equilibration, and the area per lipid and bilayer thickness are consistent with previously reported theoretical results. Mixed lipid bilayers are modelled as a combination of ceramides, cholesterol, and free fatty acids. This model is shown to be stable after equilibration. Green tea extract, also known as epigallocatechin-3-gallate, is introduced as a spherical cluster on the surface of the mixed lipid bilayer. It is demonstrated that the cluster is able to bind to the bilayers as a cluster without diffusing into the surrounding water.

  14. Dynamic Characteristics Analysis and Stabilization of PV-Based Multiple Microgrid Clusters

    DEFF Research Database (Denmark)

    Zhao, Zhuoli; Yang, Ping; Wang, Yuewu

    2018-01-01

    As the penetration of PV generation increases, there is a growing operational demand on PV systems to participate in microgrid frequency regulation. It is expected that future distribution systems will consist of multiple microgrid clusters. However, interconnecting PV microgrids may lead to system...... interactions and instability. To date, no research work has been done to analyze the dynamic behavior and enhance the stability of microgrid clusters considering the dynamics of the PV primary sources and dc links. To fill this gap, this paper presents comprehensive modeling, analysis, and stabilization of PV......-based multiple microgrid clusters. A detailed small-signal model for PV-based microgrid clusters considering local adaptive dynamic droop control mechanism of the voltage-source PV system is developed. The complete dynamic model is then used to access and compare the dynamic characteristics of the single...

  15. Numerical linked-cluster approach to quantum lattice models.

    Science.gov (United States)

    Rigol, Marcos; Bryant, Tyler; Singh, Rajiv R P

    2006-11-03

    We present a novel algorithm that allows one to obtain temperature dependent properties of quantum lattice models in the thermodynamic limit from exact diagonalization of small clusters. Our numerical linked-cluster approach provides a systematic framework to assess finite-size effects and is valid for any quantum lattice model. Unlike high temperature expansions, which have a finite radius of convergence in inverse temperature, these calculations are accurate at all temperatures provided the range of correlations is finite. We illustrate the power of our approach studying spin models on kagomé, triangular, and square lattices.

  16. Ecosystem health pattern analysis of urban clusters based on emergy synthesis: Results and implication for management

    International Nuclear Information System (INIS)

    Su, Meirong; Fath, Brian D.; Yang, Zhifeng; Chen, Bin; Liu, Gengyuan

    2013-01-01

    The evaluation of ecosystem health in urban clusters will help establish effective management that promotes sustainable regional development. To standardize the application of emergy synthesis and set pair analysis (EM–SPA) in ecosystem health assessment, a procedure for using EM–SPA models was established in this paper by combining the ability of emergy synthesis to reflect health status from a biophysical perspective with the ability of set pair analysis to describe extensive relationships among different variables. Based on the EM–SPA model, the relative health levels of selected urban clusters and their related ecosystem health patterns were characterized. The health states of three typical Chinese urban clusters – Jing-Jin-Tang, Yangtze River Delta, and Pearl River Delta – were investigated using the model. The results showed that the health status of the Pearl River Delta was relatively good; the health for the Yangtze River Delta was poor. As for the specific health characteristics, the Pearl River Delta and Yangtze River Delta urban clusters were relatively strong in Vigor, Resilience, and Urban ecosystem service function maintenance, while the Jing-Jin-Tang was relatively strong in organizational structure and environmental impact. Guidelines for managing these different urban clusters were put forward based on the analysis of the results of this study. - Highlights: • The use of integrated emergy synthesis and set pair analysis model was standardized. • The integrated model was applied on the scale of an urban cluster. • Health patterns of different urban clusters were compared. • Policy suggestions were provided based on the health pattern analysis

  17. Proteome Profiling of Vitreoretinal Diseases by Cluster Analysis

    OpenAIRE

    Shitama, Tomomi; Hayashi, Hideyuki; Noge, Sumiyo; Uchio, Eiichi; Oshima, Kenji; Haniu, Hisao; Takemori, Nobuaki; Komori, Naoka; Matsumoto, Hiroyuki

    2008-01-01

    Vitreous samples collected in retinopathic surgeries have diverse properties, making proteomics analysis difficult. We report a cluster analysis to evade this difficulty. Vitreous and subretinal fluid samples were collected from 60 patients during surgical operation of non-proliferative diabetic retinopathy, proliferative diabetic retinopathy, proliferative vitreoretinopathy, and rhegmatogenous retinal detachment. For controls we collected vitreous fluid from patients of idiopathic macular ho...

  18. Pattern recognition in menstrual bleeding diaries by statistical cluster analysis

    Directory of Open Access Journals (Sweden)

    Wessel Jens

    2009-07-01

    Full Text Available Abstract Background The aim of this paper is to empirically identify a treatment-independent statistical method to describe clinically relevant bleeding patterns by using bleeding diaries of clinical studies on various sex hormone containing drugs. Methods We used the four cluster analysis methods single, average and complete linkage as well as the method of Ward for the pattern recognition in menstrual bleeding diaries. The optimal number of clusters was determined using the semi-partial R2, the cubic cluster criterion, the pseudo-F- and the pseudo-t2-statistic. Finally, the interpretability of the results from a gynecological point of view was assessed. Results The method of Ward yielded distinct clusters of the bleeding diaries. The other methods successively chained the observations into one cluster. The optimal number of distinctive bleeding patterns was six. We found two desirable and four undesirable bleeding patterns. Cyclic and non cyclic bleeding patterns were well separated. Conclusion Using this cluster analysis with the method of Ward medications and devices having an impact on bleeding can be easily compared and categorized.

  19. Breast cancer clustering in Kanagawa, Japan: a geographic analysis.

    Science.gov (United States)

    Katayama, Kayoko; Yokoyama, Kazuhito; Yako-Suketomo, Hiroko; Okamoto, Naoyuki; Tango, Toshiro; Inaba, Yutaka

    2014-01-01

    The purpose of the present study was to determine geographic clustering of breast cancer incidence in Kanagawa Prefecture, using cancer registry data. The study also aimed at examining the association between socio-economic factors and any identified cluster. Incidence data were collected for women who were first diagnosed with breast cancer during the period from January to December 2006 in Kanagawa. The data consisted of 2,326 incidence cases extracted from the total of 34,323 Kanagawa Cancer Registration data issued in 2011. To adjust for differences in age distribution, the standardized mortality ratio (SMR) and the standardized incidence ratio (SIR) of breast cancer were calculated for each of 56 municipalities (e.g., city, special ward, town, and village) in Kanagawa by an indirect method using Kanagawa female population data. Spatial scan statistics were used to detect any area of elevated risk as a cluster for breast cancer deaths and/ or incidences. The Student t-test was performed to examine differences in socio-economic variables, viz, persons per household, total fertility rate, age at first marriage for women, and marriage rate, between cluster and other regions. There was a statistically significant cluster of breast cancer incidence (p=0.001) composed of 11 municipalities in southeastern area of Kanagawa Prefecture, whose SIR was 35 percent higher than that of the remainder of Kanagawa Prefecture. In this cluster, average value of age at first-marriage for women was significantly higher than in the rest of Kanagawa (p=0.017). No statistically significant clusters of breast cancer deaths were detected (p=0.53). There was a statistically significant cluster of high breast cancer incidence in southeastern area of Kanagawa Prefecture. It was suggested that the cluster region was related to the tendency to marry later. This study methodology will be helpful in the analysis of geographical disparities in cancer deaths and incidence.

  20. Application of fuzzy c-means clustering in data analysis of metabolomics.

    Science.gov (United States)

    Li, Xiang; Lu, Xin; Tian, Jing; Gao, Peng; Kong, Hongwei; Xu, Guowang

    2009-06-01

    Fuzzy c-means (FCM) clustering is an unsupervised method derived from fuzzy logic that is suitable for solving multiclass and ambiguous clustering problems. In this study, FCM clustering is applied to cluster metabolomics data. FCM is performed directly on the data matrix to generate a membership matrix which represents the degree of association the samples have with each cluster. The method is parametrized with the number of clusters (C) and the fuzziness coefficient (m), which denotes the degree of fuzziness in the algorithm. Both have been optimized by combining FCM with partial least-squares (PLS) using the membership matrix as the Y matrix in the PLS model. The quality parameters R(2)Y and Q(2) of the PLS model have been used to monitor and optimize C and m. Data of metabolic profiles from three gene types of Escherichia coli were used to demonstrate the method above. Different multivariable analysis methods have been compared. Principal component analysis failed to model the metabolite data, while partial least-squares discriminant analysis yielded results with overfitting. On the basis of the optimized parameters, the FCM was able to reveal main phenotype changes and individual characters of three gene types of E. coli. Coupled with PLS, FCM provides a powerful research tool for metabolomics with improved visualization, accurate classification, and outlier estimation.

  1. Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis

    Directory of Open Access Journals (Sweden)

    Gabjo Kim

    2016-12-01

    Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.

  2. A Collaboration Service Model for a Global Port Cluster

    Directory of Open Access Journals (Sweden)

    Keith K.T. Toh

    2010-03-01

    Full Text Available The importance of port clusters to a global city may be viewed from a number of perspectives. The development of port clusters and economies of agglomeration and their contribution to a regional economy is underpinned by information and physical infrastructure that facilitates collaboration between business entities within the cluster. The maturity of technologies providing portals, web and middleware services provides an opportunity to push the boundaries of contemporary service reference models and service catalogues to what the authors propose to be "collaboration services". Servicing port clusters, portal engineers of the future must consider collaboration services to benefit a region. Particularly, service orchestration through a "public user portal" must gain better utilisation of publically owned infrastructure, to share knowledge and collaborate among organisations through information systems.

  3. Aerosol cluster impact and break-up: model and implementation

    International Nuclear Information System (INIS)

    Lechman, Jeremy B.

    2010-01-01

    In this report a model for simulating aerosol cluster impact with rigid walls is presented. The model is based on JKR adhesion theory and is implemented as an enhancement to the granular (DEM) package within the LAMMPS code. The theory behind the model is outlined and preliminary results are shown. Modeling the interactions of small particles is relevant to a number of applications (e.g., soils, powders, colloidal suspensions, etc.). Modeling the behavior of aerosol particles during agglomeration and cluster dynamics upon impact with a wall is of particular interest. In this report we describe preliminary efforts to develop and implement physical models for aerosol particle interactions. Future work will consist of deploying these models to simulate aerosol cluster behavior upon impact with a rigid wall for the purpose of developing relationships for impact speed and probability of stick/bounce/break-up as well as to assess the distribution of cluster sizes if break-up occurs. These relationships will be developed consistent with the need for inputs into system-level codes. Section 2 gives background and details on the physical model as well as implementations issues. Section 3 presents some preliminary results which lead to discussion in Section 4 of future plans.

  4. Quark cluster model of nuclei and lepton scattering results

    International Nuclear Information System (INIS)

    Vary, J.P.; Iowa State Univ. of Science and Technology, Ames

    1984-01-01

    A review of the quark cluster model (QCM) of nuclei is presented along with applications to deep inelastic lepton scattering and elastic lepton scattering experiments. In addition a sample comparison is made with high momentum transfer (p, π) data. The QCM prediction for the ratio of nuclear structure functions in the x > 1 domain is discussed as a critical test of the model

  5. clusters

    Indian Academy of Sciences (India)

    2017-09-27

    Sep 27, 2017 ... while CuCoNO, Co3NO, Cu3CoNO, Cu2Co3NO, Cu3Co3NO and Cu6CoNO clusters display stronger chemical stability. Magnetic and electronic properties are also discussed. The magnetic moment is affected by charge transfer and the spd hybridization. Keywords. CumConNO (m + n = 2–7) clusters; ...

  6. Mathematical modelling of complex contagion on clustered networks

    Science.gov (United States)

    O'sullivan, David J.; O'Keeffe, Gary; Fennell, Peter; Gleeson, James

    2015-09-01

    The spreading of behavior, such as the adoption of a new innovation, is influenced bythe structure of social networks that interconnect the population. In the experiments of Centola (Science, 2010), adoption of new behavior was shown to spread further and faster across clustered-lattice networks than across corresponding random networks. This implies that the “complex contagion” effects of social reinforcement are important in such diffusion, in contrast to “simple” contagion models of disease-spread which predict that epidemics would grow more efficiently on random networks than on clustered networks. To accurately model complex contagion on clustered networks remains a challenge because the usual assumptions (e.g. of mean-field theory) regarding tree-like networks are invalidated by the presence of triangles in the network; the triangles are, however, crucial to the social reinforcement mechanism, which posits an increased probability of a person adopting behavior that has been adopted by two or more neighbors. In this paper we modify the analytical approach that was introduced by Hebert-Dufresne et al. (Phys. Rev. E, 2010), to study disease-spread on clustered networks. We show how the approximation method can be adapted to a complex contagion model, and confirm the accuracy of the method with numerical simulations. The analytical results of the model enable us to quantify the level of social reinforcement that is required to observe—as in Centola’s experiments—faster diffusion on clustered topologies than on random networks.

  7. Multiple-scattering-cluster model of covalent semiconductors

    International Nuclear Information System (INIS)

    Leite, J.R.

    1983-01-01

    A review is presented of the multiple-scattering-cluster model proposed to study the electronic structure of defects and impurities in semiconductors. Applications of this method are discussed and results for the A center in silicon are shown. Recent results obtained for complex defects in silicon are also presented. The advantage of using a localized description of the electronic structure of solids instead of the conventional band structure description is emphasized. The promising agreement with experimental results leads to the conclusion that the cluster model discussed in this paper is a suitable technique for studying the electronic structure of locally perturbed semiconductors. Perspectives for future work are also analysed. (Author) [pt

  8. Application of microarray analysis on computer cluster and cloud platforms.

    Science.gov (United States)

    Bernau, C; Boulesteix, A-L; Knaus, J

    2013-01-01

    Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.

  9. Towards a symptom cluster model in chronic kidney disease: A structural equation approach.

    Science.gov (United States)

    Almutary, Hayfa; Douglas, Clint; Bonner, Ann

    2017-10-01

    The aim of this study was to test a symptom cluster model in chronic kidney disease patients based on the Theory of Unpleasant Symptoms, accounting for the relationships between influencing factors, symptom experience and consequences for quality of life. The evaluation of symptom clusters is a new field of scientific inquiry directed towards more focused symptom management. Yet, little is known about relationships between symptom clusters, predictors and the synergistic effect of multiple symptoms on outcomes. Cross-sectional. Data were collected from 436 patients with advanced stages of chronic kidney disease during July 2013-February 2014 using validated measures of symptom burden and quality of life. Analysis involved structural equation modelling. The final model demonstrated good fit with the data and provided strong evidence for the predicted relationships. Psychological distress, stage of chronic kidney disease and age explained most of the variance in symptom experience. Symptom clusters had a strong negative effect on quality of life, with fatigue, sexual symptoms and restless legs being the strongest predictors. Overall, the model explained more than half of the deterioration in quality of life. However, a reciprocal path between quality of life and symptom experience was not found. Interventions targeting symptom clusters could greatly improve quality of life in patients with chronic kidney disease. The symptom cluster model presented has important clinical and heuristic implications, serving as a framework to encourage and guide new lines of intervention research to reduce symptom burden in chronic kidney disease. © 2017 John Wiley & Sons Ltd.

  10. Dynamic analysis of clustered building structures using substructures methods

    International Nuclear Information System (INIS)

    Leimbach, K.R.; Krutzik, N.J.

    1989-01-01

    The dynamic substructure approach to the building cluster on a common base mat starts with the generation of Ritz-vectors for each building on a rigid foundation. The base mat plus the foundation soil is subjected to kinematic constraint modes, for example constant, linear, quadratic or cubic constraints. These constraint modes are also imposed on the buildings. By enforcing kinematic compatibility of the complete structural system on the basis of the constraint modes a reduced Ritz model of the complete cluster is obtained. This reduced model can now be analyzed by modal time history or response spectrum methods

  11. Modeling and clustering users with evolving profiles in usage streams

    KAUST Repository

    Zhang, Chongsheng

    2012-09-01

    Today, there is an increasing need of data stream mining technology to discover important patterns on the fly. Existing data stream models and algorithms commonly assume that users\\' records or profiles in data streams will not be updated or revised once they arrive. Nevertheless, in various applications such asWeb usage, the records/profiles of the users can evolve along time. This kind of streaming data evolves in two forms, the streaming of tuples or transactions as in the case of traditional data streams, and more importantly, the evolving of user records/profiles inside the streams. Such data streams bring difficulties on modeling and clustering for exploring users\\' behaviors. In this paper, we propose three models to summarize this kind of data streams, which are the batch model, the Evolving Objects (EO) model and the Dynamic Data Stream (DDS) model. Through creating, updating and deleting user profiles, these models summarize the behaviors of each user as a profile object. Based upon these models, clustering algorithms are employed to discover interesting user groups from the profile objects. We have evaluated all the proposed models on a large real-world data set, showing that the DDS model summarizes the data streams with evolving tuples more efficiently and effectively, and provides better basis for clustering users than the other two models. © 2012 IEEE.

  12. Point Cluster Analysis Using a 3D Voronoi Diagram with Applications in Point Cloud Segmentation

    Directory of Open Access Journals (Sweden)

    Shen Ying

    2015-08-01

    Full Text Available Three-dimensional (3D point analysis and visualization is one of the most effective methods of point cluster detection and segmentation in geospatial datasets. However, serious scattering and clotting characteristics interfere with the visual detection of 3D point clusters. To overcome this problem, this study proposes the use of 3D Voronoi diagrams to analyze and visualize 3D points instead of the original data item. The proposed algorithm computes the cluster of 3D points by applying a set of 3D Voronoi cells to describe and quantify 3D points. The decompositions of point cloud of 3D models are guided by the 3D Voronoi cell parameters. The parameter values are mapped from the Voronoi cells to 3D points to show the spatial pattern and relationships; thus, a 3D point cluster pattern can be highlighted and easily recognized. To capture different cluster patterns, continuous progressive clusters and segmentations are tested. The 3D spatial relationship is shown to facilitate cluster detection. Furthermore, the generated segmentations of real 3D data cases are exploited to demonstrate the feasibility of our approach in detecting different spatial clusters for continuous point cloud segmentation.

  13. Fuzzy Modeled K-Cluster Quality Mining of Hidden Knowledge for Decision Support

    OpenAIRE

    S. Parkash  Kumar; K. S. Ramaswami

    2011-01-01

    Problem statement: The work presented Fuzzy Modeled K-means Cluster Quality Mining of hidden knowledge for Decision Support. Based on the number of clusters, number of objects in each cluster and its cohesiveness, precision and recall values, the cluster quality metrics is measured. The fuzzy k-means is adapted approach by using heuristic method which iterates the cluster to form an efficient valid cluster. With the obtained data clusters, quality assessment is made by predictive mining using...

  14. Identifying clinical course patterns in SMS data using cluster analysis.

    Science.gov (United States)

    Kent, Peter; Kongsted, Alice

    2012-07-02

    Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important subgroups in the outcomes of research studies. Two previous studies have investigated detailed clinical course patterns in SMS data obtained from people seeking care for low back pain. One used a visual analysis approach and the other performed a cluster analysis of SMS data that had first been transformed by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole group, by including all SMS time points in their original form. It was a 'proof of concept' study to explore the potential, clinical relevance, strengths and weakness of such an approach. This was a secondary analysis of longitudinal SMS data collected in two randomised controlled trials conducted simultaneously from a single clinical population (n = 322). Fortnightly SMS data collected over a year on 'days of problematic low back pain' and on 'days of sick leave' were analysed using Two-Step (probabilistic) Cluster Analysis. Clinical course patterns were identified that were clinically interpretable and different from those of the whole group. Similar patterns were obtained when the number of SMS time points was reduced to monthly. The advantages and disadvantages of this method were contrasted to that of first transforming SMS data by spline analysis. This study showed that clinical course patterns can be identified by cluster analysis using all SMS time points as cluster variables. This method is simple, intuitive and does not require a high level of statistical skill. However, there

  15. Galaxy Cluster Pressure Profiles as Determined by Sunyaev Zel’dovich Effect Observations with MUSTANG and Bolocam. II. Joint Analysis of 14 Clusters

    Science.gov (United States)

    Romero, Charles E.; Mason, Brian S.; Sayers, Jack; Mroczkowski, Tony; Sarazin, Craig; Donahue, Megan; Baldi, Alessandro; Clarke, Tracy E.; Young, Alexander H.; Sievers, Jonathan; Dicker, Simon R.; Reese, Erik D.; Czakon, Nicole; Devlin, Mark; Korngut, Phillip M.; Golwala, Sunil

    2017-04-01

    We present pressure profiles of galaxy clusters determined from high-resolution Sunyaev-Zel’dovich (SZ) effect observations of 14 clusters, which span the redshift range of 0.25MUSTANG and Bolocam data. In this analysis, we adopt the generalized NFW parameterization of pressure profiles to produce our models. Our constraints on ensemble-average pressure profile parameters, in this study γ, C 500, and P 0, are consistent with those in previous studies, but for individual clusters we find discrepancies with the X-ray derived pressure profiles from the ACCEPT2 database. We investigate potential sources of these discrepancies, especially cluster geometry, electron temperature of the intracluster medium, and substructure. We find that the ensemble mean profile for all clusters in our sample is described by the parameters [γ ,{C}500,{P}0]=[{0.3}-0.1+0.1,{1.3}-0.1+0.1,{8.6}-2.4+2.4], cool core clusters are described by [γ ,{C}500,{P}0] =[{0.6}-0.1+0.1,{0.9}-0.1+0.1,{3.6}-1.5+1.5], and disturbed clusters are described by [γ ,{C}500,{P}0]=[{0.0}-0.0+0.1,{1.5}-0.2+0.1,{13.8}-1.6+1.6]. Of the 14 clusters, 4 have clear substructure in our SZ observations, while an additional 2 clusters exhibit potential substructure.

  16. Applied Hierarchical Cluster Analysis with Average Linkage Algoritm

    Directory of Open Access Journals (Sweden)

    Cindy Cahyaning Astuti

    2017-11-01

    Full Text Available This research was conducted in Sidoarjo District where source of data used from secondary data contained in the book "Kabupaten Sidoarjo Dalam Angka 2016" .In this research the authors chose 12 variables that can represent sub-district characteristics in Sidoarjo. The variable that represents the characteristics of the sub-district consists of four sectors namely geography, education, agriculture and industry. To determine the equitable geographical conditions, education, agriculture and industry each district, it would require an analysis to classify sub-districts based on the sub-district characteristics. Hierarchical cluster analysis is the analytical techniques used to classify or categorize the object of each case into a relatively homogeneous group expressed as a cluster. The results are expected to provide information about dominant sub-district characteristics and non-dominant sub-district characteristics in four sectors based on the results of the cluster is formed.

  17. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  18. Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups

    Science.gov (United States)

    Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José

    2013-01-01

    Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674

  19. Quasi-free scattering and the cluster model

    International Nuclear Information System (INIS)

    Vasconcellos, C.A.Z.

    1980-01-01

    A study is made of the influence of the nuclear structure on the effective polarization of the knocked-out nucleon in a quasi-free process. The case Li 6 + p → He 5 + 2p is considered and the predictions of two models are compared. In the first model the Li 6 nucleus is represented by the He 4 + D 2 clusters and in the second one by a shell-model wave function. (Author) [pt

  20. Medical Inpatient Journey Modeling and Clustering: A Bayesian Hidden Markov Model Based Approach.

    Science.gov (United States)

    Huang, Zhengxing; Dong, Wei; Wang, Fei; Duan, Huilong

    2015-01-01

    Modeling and clustering medical inpatient journeys is useful to healthcare organizations for a number of reasons including inpatient journey reorganization in a more convenient way for understanding and browsing, etc. In this study, we present a probabilistic model-based approach to model and cluster medical inpatient journeys. Specifically, we exploit a Bayesian Hidden Markov Model based approach to transform medical inpatient journeys into a probabilistic space, which can be seen as a richer representation of inpatient journeys to be clustered. Then, using hierarchical clustering on the matrix of similarities, inpatient journeys can be clustered into different categories w.r.t their clinical and temporal characteristics. We evaluated the proposed approach on a real clinical data set pertaining to the unstable angina treatment process. The experimental results reveal that our method can identify and model latent treatment topics underlying in personalized inpatient journeys, and yield impressive clustering quality.

  1. Oxide-supported metal clusters: models for heterogeneous catalysts

    International Nuclear Information System (INIS)

    Santra, A K; Goodman, D W

    2003-01-01

    Understanding the size-dependent electronic, structural and chemical properties of metal clusters on oxide supports is an important aspect of heterogeneous catalysis. Recently model oxide-supported metal catalysts have been prepared by vapour deposition of catalytically relevant metals onto ultra-thin oxide films grown on a refractory metal substrate. Reactivity and spectroscopic/microscopic studies have shown that these ultra-thin oxide films are excellent models for the corresponding bulk oxides, yet are sufficiently electrically conductive for use with various modern surface probes including scanning tunnelling microscopy (STM). Measurements on metal clusters have revealed a metal to nonmetal transition as well as changes in the crystal and electronic structures (including lattice parameters, band width, band splitting and core-level binding energy shifts) as a function of cluster size. Size-dependent catalytic reactivity studies have been carried out for several important reactions, and time-dependent catalytic deactivation has been shown to arise from sintering of metal particles under elevated gas pressures and/or reactor temperatures. In situ STM methodologies have been developed to follow the growth and sintering kinetics on a cluster-by-cluster basis. Although several critical issues have been addressed by several groups worldwide, much more remains to be done. This article highlights some of these accomplishments and summarizes the challenges that lie ahead. (topical review)

  2. Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model

    Science.gov (United States)

    Li, X. L.; Zhao, Q. H.; Li, Y.

    2017-09-01

    Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.

  3. Continental scale analysis of bird migration timing: influences of climate and life history traits-a generalized mixture model clustering and discriminant approach.

    Science.gov (United States)

    Chambers, Lynda E; Beaumont, Linda J; Hudson, Irene L

    2014-08-01

    There is substantial evidence of climate-related shifts to the timing of avian migration. Although spring arrival has generally advanced, variable species responses and geographical biases in data collection make it difficult to generalise patterns. We advance previous studies by using novel multivariate statistical techniques to explore complex relationships between phenological trends, climate indices and species traits. Using 145 datasets for 52 bird species, we assess trends in first arrival date (FAD), last departure date (LDD) and timing of peak abundance at multiple Australian locations. Strong seasonal patterns were found, i.e. spring phenological events were more likely to significantly advance, while significant advances and delays occurred in other seasons. However, across all significant trends, the magnitude of delays exceeded that of advances, particularly for FAD (+22.3 and -9.6 days/decade, respectively). Geographic variations were found, with greater advances in FAD and LDD, in south-eastern Australia than in the north and west. We identified four species clusters that differed with respect to species traits and climate drivers. Species within bird clusters responded in similar ways to local climate variables, particularly the number of raindays and rainfall. The strength of phenological trends was more strongly related to local climate variables than to broad-scale drivers (Southern Oscillation Index), highlighting the importance of precipitation as a driver of movement in Australian birds.

  4. Frailty phenotypes in the elderly based on cluster analysis

    DEFF Research Database (Denmark)

    Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo

    2012-01-01

    Frailty is a physiological state characterized by the deregulation of multiple physiologic systems of an aging organism determining the loss of homeostatic capacity, which exposes the elderly to disability, diseases, and finally death. An operative definition of frailty, useful for the classifica......Frailty is a physiological state characterized by the deregulation of multiple physiologic systems of an aging organism determining the loss of homeostatic capacity, which exposes the elderly to disability, diseases, and finally death. An operative definition of frailty, useful...... genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...... groups of subjects homogeneous for their frailty status and characterized by different survival patterns. A subsequent survival analysis availing of Accelerated Failure Time models allowed us to formulate an operative index able to correlate classification variables with survival probability. From...

  5. Cluster analysis as a prediction tool for pregnancy outcomes.

    Science.gov (United States)

    Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L

    2015-03-01

    Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.

  6. Language Learner Motivational Types: A Cluster Analysis Study

    Science.gov (United States)

    Papi, Mostafa; Teimouri, Yasser

    2014-01-01

    The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…

  7. Characterization of population exposure to organochlorines: A cluster analysis application

    NARCIS (Netherlands)

    R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)

    2013-01-01

    textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues

  8. Cluster analysis for validated climatology stations using precipitation in Mexico

    NARCIS (Netherlands)

    Bravo Cabrera, J. L.; Azpra-Romero, E.; Zarraluqui-Such, V.; Gay-García, C.; Estrada Porrúa, F.

    2012-01-01

    Annual average of daily precipitation was used to group climatological stations into clusters using the k-means procedure and principal component analysis with varimax rotation. After a careful selection of the stations deployed in Mexico since 1950, we selected 349 characterized by having 35 to 40

  9. cluster

    Indian Academy of Sciences (India)

    has been investigated electrochemically in positive and negative microenvironments, both in solution and in film. Charge nature around the active centre ... in plants, bacteria and also in mammals. This cluster is also an important constituent of a ..... selection of non-cysteine amino acid in the active centre of Rieske proteins.

  10. Problems with a simple-minded cluster model

    International Nuclear Information System (INIS)

    Adhikari, S.K.

    1980-01-01

    Cluster model approximation for the resolvent operator can reduce many-body Lippmann-Schwinger equations to an efective two-body equation. It is shown that such approximation may suppress mathematical mechanisms for rearrangement processes. This leads then to highly reduced wave functions and weak effective intercluster potentials. (L.C.) [pt

  11. Metal cluster fission: jellium model and Molecular dynamics simulations

    DEFF Research Database (Denmark)

    Lyalin, Andrey G.; Obolensky, Oleg I.; Solov'yov, Ilia

    2004-01-01

    Fission of doubly charged sodium clusters is studied using the open-shell two-center deformed jellium model approximation and it ab initio molecular dynamic approach accounting for all electrons in the system. Results of calculations of fission reactions Na_10^2+ --> Na_7^+ + Na_3^+ and Na_18...

  12. Emergence of clustering in an acquaintance model without homophily

    Science.gov (United States)

    Bhat, Uttam; Krapivsky, P. L.; Redner, S.

    2014-11-01

    We introduce an agent-based acquaintance model in which social links are created by processes in which there is no explicit homophily. In spite of the homogeneous nature of the social interactions, highly-clustered social networks can arise. The crucial feature of our model is that of variable transitive interactions. Namely, when an agent introduces two unconnected friends, the rate at which a connection actually occurs between them depends on the number of their mutual acquaintances. As this transitive interaction rate is varied, the social network undergoes a dramatic clustering transition. Close to the transition, the network consists of a collection of well-defined communities. As a function of time, the network can also undergo an incomplete gelation transition, in which the gel, or giant cluster, does not constitute the entire network, even at infinite time. Some of the clustering properties of our model also arise, but in a more gradual manner, in Facebook networks. Finally, we discuss a more realistic variant of our original model in which network realizations can be constructed that quantitatively match Facebook networks.

  13. The dilute random field Ising model by finite cluster approximation

    International Nuclear Information System (INIS)

    Benyoussef, A.; Saber, M.

    1987-09-01

    Using the finite cluster approximation, phase diagrams of bond and site diluted three-dimensional simple cubic Ising models with a random field have been determined. The resulting phase diagrams have the same general features for both bond and site dilution. (author). 7 refs, 4 figs

  14. Emergence of clustering in an acquaintance model without homophily

    International Nuclear Information System (INIS)

    Bhat, Uttam; Krapivsky, P L; Redner, S

    2014-01-01

    We introduce an agent-based acquaintance model in which social links are created by processes in which there is no explicit homophily. In spite of the homogeneous nature of the social interactions, highly-clustered social networks can arise. The crucial feature of our model is that of variable transitive interactions. Namely, when an agent introduces two unconnected friends, the rate at which a connection actually occurs between them depends on the number of their mutual acquaintances. As this transitive interaction rate is varied, the social network undergoes a dramatic clustering transition. Close to the transition, the network consists of a collection of well-defined communities. As a function of time, the network can also undergo an incomplete gelation transition, in which the gel, or giant cluster, does not constitute the entire network, even at infinite time. Some of the clustering properties of our model also arise, but in a more gradual manner, in Facebook networks. Finally, we discuss a more realistic variant of our original model in which network realizations can be constructed that quantitatively match Facebook networks. (paper)

  15. The effect of alkylating agents on model supported metal clusters

    Energy Technology Data Exchange (ETDEWEB)

    Erdem-Senatalar, A.; Blackmond, D.G.; Wender, I. (Pittsburgh Univ., PA (USA). Dept. of Chemical and Petroleum Engineering); Oukaci, R. (CERHYD, Algiers (Algeria))

    1988-01-01

    Interactions between model supported metal clusters and alkylating agents were studied in an effort to understand a novel chemical trapping technique developed for identifying species adsorbed on catalyst surfaces. It was found that these interactions are more complex than had previously been suggested. Studies were completed using deuterium-labeled dimethyl sulfate (DMS), (CH{sub 3}){sub 2}SO{sub 4}, as a trapping agent to interact with the supported metal cluster ethylidyne tricobalt enneacarbonyl. Results showed that oxygenated products formed during the trapping reaction contained {minus}OCD{sub 3} groups from the DMS, indicating that the interaction was not a simple alkylation. 18 refs., 1 fig., 3 tabs.

  16. Fault detection of flywheel system based on clustering and principal component analysis

    Directory of Open Access Journals (Sweden)

    Wang Rixin

    2015-12-01

    Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.

  17. NMR metabolic analysis of samples using fuzzy K-means clustering.

    Science.gov (United States)

    Cuperlović-Culf, Miroslava; Belacel, Nabil; Culf, Adrian S; Chute, Ian C; Ouellette, Rodney J; Burton, Ian W; Karakach, Tobias K; Walter, John A

    2009-12-01

    The global analysis of metabolites can be used to define the phenotypes of cells, tissues or organisms. Classifying groups of samples based on their metabolic profile is one of the main topics of metabolomics research. Crisp clustering methods assign each feature to one cluster, thereby omitting information about the multiplicity of sample subtypes. Here, we present the application of fuzzy K-means clustering method for the classification of samples based on metabolomics 1D (1)H NMR fingerprints. The sample classification was performed on NMR spectra of cancer cell line extracts and of urine samples of type 2 diabetes patients and animal models. The cell line dataset included NMR spectra of lipophilic cell extracts for two normal and three cancer cell lines with cancer cell lines including two invasive and one non-invasive cancers. The second dataset included previously published NMR spectra of urine samples of human type 2 diabetics and healthy controls, mouse wild type and diabetes model and rat obese and lean phenotypes. The fuzzy K-means clustering method allowed more accurate sample classification in both datasets relative to the other tested methods including principal component analysis (PCA), hierarchical clustering (HCL) and K-means clustering. In the cell line samples, fuzzy clustering provided a clear separation of individual cell lines, groups of cancer and normal cell lines as well as non-invasive and invasive tumour cell lines. In the diabetes dataset, clear separation of healthy controls and diabetics in all three models was possible only by using the fuzzy clustering method.

  18. Understanding the Support Needs of People with Intellectual and Related Developmental Disabilities through Cluster Analysis and Factor Analysis of Statewide Data

    Science.gov (United States)

    Viriyangkura, Yuwadee

    2014-01-01

    Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…

  19. EFFICIENCY OF SMES IN ROMANIA POST CRISIS. A CLUSTERING ANALYSIS

    Directory of Open Access Journals (Sweden)

    Cristina SUCIU

    2014-06-01

    Full Text Available Small and medium-sized enterprises (SMEs have had, even in the economic crisis, a major contribution to the achievement of gross domestic product, to create jobs, to increase economic efficiency by stimulating competition through speed of adaptation to conditions and the adoption of new strategies, the ability to adapt to market requirements. Although, at the beginning of the economic crisis in Romania have been suspended or canceled several hundred thousand companies, starting in 2012 it is observed a revival of SMEs. We could say that post crisis period, thanks to measures in support of SMEs, is the beginning of an economic boost of SMEs in Romania. Cluster analysis a multivariate analys is technique, which includes a number of algorithms for classifying objects in to homogeneous groups. Analysis of effectiveness of SMEs from Romania using cluster analysisis a new method of economic analysis which enables an analysis, mathematical methods, regional development of SMEs and increasing their competitiveness.

  20. Phenotypic clustering: a novel method for microglial morphology analysis.

    Science.gov (United States)

    Verdonk, Franck; Roux, Pascal; Flamant, Patricia; Fiette, Laurence; Bozza, Fernando A; Simard, Sébastien; Lemaire, Marc; Plaud, Benoit; Shorte, Spencer L; Sharshar, Tarek; Chrétien, Fabrice; Danckaert, Anne

    2016-06-17

    Microglial cells are tissue-resident macrophages of the central nervous system. They are extremely dynamic, sensitive to their microenvironment and present a characteristic complex and heterogeneous morphology and distribution within the brain tissue. Many experimental clues highlight a strong link between their morphology and their function in response to aggression. However, due to their complex "dendritic-like" aspect that constitutes the major pool of murine microglial cells and their dense network, precise and powerful morphological studies are not easy to realize and complicate correlation with molecular or clinical parameters. Using the knock-in mouse model CX3CR1(GFP/+), we developed a 3D automated confocal tissue imaging system coupled with morphological modelling of many thousands of microglial cells revealing precise and quantitative assessment of major cell features: cell density, cell body area, cytoplasm area and number of primary, secondary and tertiary processes. We determined two morphological criteria that are the complexity index (CI) and the covered environment area (CEA) allowing an innovative approach lying in (i) an accurate and objective study of morphological changes in healthy or pathological condition, (ii) an in situ mapping of the microglial distribution in different neuroanatomical regions and (iii) a study of the clustering of numerous cells, allowing us to discriminate different sub-populations. Our results on more than 20,000 cells by condition confirm at baseline a regional heterogeneity of the microglial distribution and phenotype that persists after induction of neuroinflammation by systemic injection of lipopolysaccharide (LPS). Using clustering analysis, we highlight that, at resting state, microglial cells are distributed in four microglial sub-populations defined by their CI and CEA with a regional pattern and a specific behaviour after challenge. Our results counteract the classical view of a homogenous regional resting

  1. Cosmological analysis of galaxy clusters surveys in X-rays

    International Nuclear Information System (INIS)

    Clerc, N.

    2012-01-01

    Clusters of galaxies are the most massive objects in equilibrium in our Universe. Their study allows to test cosmological scenarios of structure formation with precision, bringing constraints complementary to those stemming from the cosmological background radiation, supernovae or galaxies. They are identified through the X-ray emission of their heated gas, thus facilitating their mapping at different epochs of the Universe. This report presents two surveys of galaxy clusters detected in X-rays and puts forward a method for their cosmological interpretation. Thanks to its multi-wavelength coverage extending over 10 sq. deg. and after one decade of expertise, the XMM-LSS allows a systematic census of clusters in a large volume of the Universe. In the framework of this survey, the first part of this report describes the techniques developed to the purpose of characterizing the detected objects. A particular emphasis is placed on the most distant ones (z ≥ 1) through the complementarity of observations in X-ray, optical and infrared bands. Then the X-CLASS survey is fully described. Based on XMM archival data, it provides a new catalogue of 800 clusters detected in X-rays. A cosmological analysis of this survey is performed thanks to 'CR-HR' diagrams. This new method self-consistently includes selection effects and scaling relations and provides a means to bypass the computation of individual cluster masses. Propositions are made for applying this method to future surveys as XMM-XXL and eRosita. (author) [fr

  2. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

    International Nuclear Information System (INIS)

    Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

    2014-01-01

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space

  3. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions

    Science.gov (United States)

    Nedialkova, Lilia V.; Amat, Miguel A.; Kevrekidis, Ioannis G.; Hummer, Gerhard

    2014-09-01

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.

  4. Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions.

    Science.gov (United States)

    Nedialkova, Lilia V; Amat, Miguel A; Kevrekidis, Ioannis G; Hummer, Gerhard

    2014-09-21

    Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small--but nontrivial--differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.

  5. Fuzzy cluster analysis of air quality in Beijing district

    Science.gov (United States)

    Liu, Hongkai

    2018-02-01

    The principle of fuzzy clustering analysis is applied in this article, by using the method of transitive closure, the main air pollutants in 17 districts of Beijing from 2014 to 2016 were classified. The results of the analysis reflects the nearly three year’s changes of the main air pollutants in Beijing. This can provide the scientific for atmospheric governance in the Beijing area and digital support.

  6. Model study in chemisorption: atomic hydrogen on beryllium clusters

    International Nuclear Information System (INIS)

    Bauschlicher, C.W. Jr.

    1976-08-01

    The interaction between atomic hydrogen and the (0001) surface of Be metal has been studied by ab initio electronic structure theory. Self-consistent-field (SCF) calculations have been performed using minimum, optimized minimum, double zeta and mixed basis sets for clusters as large as 22 Be atoms. The binding energy and equilibrium geometry (the distance to the surface) were determined for 4 sites. Both spatially restricted (the wavefunction was constrained to transform as one of the irreducible representations of the molecular point group) and unrestricted SCF calculations were performed. Using only the optimized minimum basis set, clusters containing as many as 22 beryllium atoms have been investigated. From a variety of considerations, this cluster is seen to be nearly converged within the model used, providing the most reliable results for chemisorption. The site dependence of the frequency is shown to be a geometrical effect depending on the number and angle of the bonds. The diffusion of atomic hydrogen through a perfect beryllium crystal is predicted to be energetically unfavorable. The cohesive energy, the ionization energy and the singlet-triplet separation were computed for the clusters without hydrogen. These quantities can be seen as a measure of the total amount of edge effects. The chemisorptive properties are not related to the total amount of edge effects, but rather the edge effects felt by the adsorbate bonding berylliums. This lack of correlation with the total edge effects illustrates the local nature of the bonding, further strengthening the cluster model for chemisorption. A detailed discussion of the bonding and electronic structure is included. The remaining edge effects for the Be 22 cluster are discussed

  7. The sine Gordon model perturbation theory and cluster Monte Carlo

    CERN Document Server

    Hasenbusch, M; Pinn, K

    1994-01-01

    We study the expansion of the surface thickness in the 2-dimensional lattice Sine Gordon model in powers of the fugacity z. Using the expansion to order z**2, we derive lines of constant physics in the rough phase. We describe and test a VMR cluster algorithm for the Monte Carlo simulation of the model. The algorithm shows nearly no critical slowing down. We apply the algorithm in a comparison of our perturbative results with Monte Carlo data.

  8. Advances in Bayesian Model Based Clustering Using Particle Learning

    Energy Technology Data Exchange (ETDEWEB)

    Merl, D M

    2009-11-19

    Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original

  9. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    Science.gov (United States)

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  10. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    International Nuclear Information System (INIS)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L.; Vassiou, K.

    2015-01-01

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST cluster , average of minimum distance—AMINDIST cluster ) and the area overlap measure (AOM cluster ). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross

  11. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    Energy Technology Data Exchange (ETDEWEB)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)

    2015-10-15

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing

  12. DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab

    Directory of Open Access Journals (Sweden)

    Alexander Chailytko

    2016-05-01

    Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.

  13. The Parental Environment Cluster Model of Child Neglect: An Integrative Conceptual Model.

    Science.gov (United States)

    Burke, Judith; Chandy, Joseph; Dannerbeck, Anne; Watt, J. Wilson

    1998-01-01

    Presents Parental Environment Cluster model of child neglect which identifies three clusters of factors involved in parents' neglectful behavior: (1) parenting skills and functions; (2) development and use of positive social support; and (3) resource availability and management skills. Model offers a focal theory for research, structure for…

  14. Visual Analysis and Processing of Clusters Structures in Multidimensional Datasets

    Science.gov (United States)

    Bondarev, A. E.

    2017-05-01

    The article is devoted to problems of visual analysis of clusters structures for a multidimensional datasets. For visual analyzing an approach of elastic maps design [1,2] is applied. This approach is quite suitable for processing and visualizing of multidimensional datasets. To analyze clusters in original data volume the elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of dataset in question are projected to the map. The extension of designed map to a flat plane allows one to get an insight about the cluster structure of multidimensional dataset. The approach of elastic maps does not require any a priori information about data in question and does not depend on data nature, data origin, etc. Elastic maps are usually combined with PCA approach. Being presented in the space based on three first principal components the elastic maps provide quite good results. The article describes the results of elastic maps approach application to visual analysis of clusters for different multidimensional datasets including medical data.

  15. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  16. Mobility in Europe: Recent Trends from a Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Ioana Manafi

    2017-08-01

    Full Text Available During the past decade, Europe was confronted with major changes and events offering large opportunities for mobility. The EU enlargement process, the EU policies regarding youth, the economic crisis affecting national economies on different levels, political instabilities in some European countries, high rates of unemployment or the increasing number of refugees are only a few of the factors influencing net migration in Europe. Based on a set of socio-economic indicators for EU/EFTA countries and cluster analysis, the paper provides an overview of regional differences across European countries, related to migration magnitude in the identified clusters. The obtained clusters are in accordance with previous studies in migration, and appear stable during the period of 2005-2013, with only some exceptions. The analysis revealed three country clusters: EU/EFTA center-receiving countries, EU/EFTA periphery-sending countries and EU/EFTA outlier countries, the names suggesting not only the geographical position within Europe, but the trends in net migration flows during the years. Therewith, the results provide evidence for the persistence of a movement from periphery to center countries, which is correlated with recent flows of mobility in Europe.

  17. Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.

    Science.gov (United States)

    Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed

    2015-11-05

    Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Identification of discriminatory variables in proteomics data analysis by clustering of variables.

    Science.gov (United States)

    Karimi, Sadegh; Hemmateenejad, Bahram

    2013-03-12

    This article presents a data analysis method for biomarker discovery in proteomics data analysis. In factor analysis-based discriminate models, the latent variables (LV's) are calculated from the response data measured at all employed instrument channels. Since some channels are irrelevant and their responses do not possess useful information, the extracted LV's possess mixed information from both useful and irrelevant channels. In this work, clustering of variables (CLoVA) based on unsupervised pattern recognition is suggested as an efficient method to identify the most informative spectral region and then it is used to construct a more predictive multivariate classification model. In the suggested method, the instrument channels (m/z value) are clustered into different clusters via self-organization map. Subsequently, the spectral data of each cluster are separately used as the input variables of classification methods such as partial least square-discriminate analysis (PLS-DA) and extended canonical variate analysis (ECVA). The proposed method is evaluated by the analysis of two experimental data sets (ovarian and prostate cancer data set). It is found that our proposed method is able to detect cancerous from healthy samples with much higher sensitivity and selectivity than conventional PLS-DA and ECVA methods. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Applications of cluster analysis to the creation of perfectionism profiles: a comparison of two clustering approaches.

    Science.gov (United States)

    Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C

    2014-01-01

    Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  20. Applications of Cluster Analysis to the Creation of Perfectionism Profiles: A Comparison of two Clustering Approaches

    Directory of Open Access Journals (Sweden)

    Jocelyn H Bolin

    2014-04-01

    Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.

  1. STAR CLUSTER PROPERTIES IN TWO LEGUS GALAXIES COMPUTED WITH STOCHASTIC STELLAR POPULATION SYNTHESIS MODELS

    International Nuclear Information System (INIS)

    Krumholz, Mark R.; Adamo, Angela; Fumagalli, Michele; Wofford, Aida; Calzetti, Daniela; Grasha, Kathryn; Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Ubeda, Leonardo; Gouliermis, Dimitrios A.; Kim, Hwihyun; Nair, Preethi; Ryon, Jenna E.; Smith, Linda J.; Thilker, David; Zackrisson, Erik

    2015-01-01

    We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar

  2. Three-Verb Clusters in Interference Frisian: A Stochastic Model over Sequential Syntactic Input.

    Science.gov (United States)

    Hoekstra, Eric; Versloot, Arjen

    2016-03-01

    Abstract Interference Frisian (IF) is a variety of Frisian, spoken by mostly younger speakers, which is heavily influenced by Dutch. IF exhibits all six logically possible word orders in a cluster of three verbs. This phenomenon has been researched by Koeneman and Postma (2006), who argue for a parameter theory, which leaves frequency differences between various orders unexplained. Rejecting Koeneman and Postma's parameter theory, but accepting their conclusion that Dutch (and Frisian) data are input for the grammar of IF, we will argue that the word order preferences of speakers of IF are determined by frequency and similarity. More specifically, three-verb clusters in IF are sensitive to: their linear left-to-right similarity to two-verb clusters and three-verb clusters in Frisian and in Dutch; the (estimated) frequency of two- and three-verb clusters in Frisian and Dutch. The model will be shown to work best if Dutch and Frisian, and two- and three-verb clusters, have equal impact factors. If different impact factors are taken, the model's predictions do not change substantially, testifying to its robustness. This analysis is in line with recent ideas that the sequential nature of human speech is more important to syntactic processes than commonly assumed, and that less burden need be put on the hierarchical dimension of syntactic structure.

  3. Techniques and instruments used for real-time analysis of atmospheric nanoscale molecular clusters: A review

    Directory of Open Access Journals (Sweden)

    Xue Li

    2015-11-01

    Full Text Available The extremely high concentrations of PM2.5 (particulate matter with an aerodynamic meter ≤ 2.5 μm during severe and persistent haze events in China have been closely related to the formation of secondary aerosols (SA. New particle formation (NPF is the critical initial step of SA formation. New particles are commonly formed from gas-phase precursors (e.g., SO2, volatile organic compounds via nucleation and initial growth, in which molecular clusters with a mobility diameter smaller than 3 nm (hereafter referred to nanoscale molecular clusters will be involved throughout the whole process. Recently, significant breakthroughs have been obtained on NPF studies, which are mostly attributed to the technical development in the real-time analysis of size-resolved number concentration and chemical composition of nanoscale molecular clusters. Regarding the detection of size-resolved number concentrations of nanoscale molecular clusters, both methods and instruments have been well built up; practical application in laboratory-scale experiments and field measurements have also been successfully demonstrated. In contrast, real-time analysis of chemical composition of nanoscale molecular clusters has still encountered the great challenges caused by the complex organic compositions of the clusters, and improvement of present analytical strategies is urgently required. The better understanding in NPF will not only benefit the atmospheric modeling and climate predictions but also the source control of SA.

  4. Efficient speaker verification using Gaussian mixture model component clustering.

    Energy Technology Data Exchange (ETDEWEB)

    De Leon, Phillip L. (New Mexico State University, Las Cruces, NM); McClanahan, Richard D.

    2012-04-01

    In speaker verification (SV) systems that employ a support vector machine (SVM) classifier to make decisions on a supervector derived from Gaussian mixture model (GMM) component mean vectors, a significant portion of the computational load is involved in the calculation of the a posteriori probability of the feature vectors of the speaker under test with respect to the individual component densities of the universal background model (UBM). Further, the calculation of the sufficient statistics for the weight, mean, and covariance parameters derived from these same feature vectors also contribute a substantial amount of processing load to the SV system. In this paper, we propose a method that utilizes clusters of GMM-UBM mixture component densities in order to reduce the computational load required. In the adaptation step we score the feature vectors against the clusters and calculate the a posteriori probabilities and update the statistics exclusively for mixture components belonging to appropriate clusters. Each cluster is a grouping of multivariate normal distributions and is modeled by a single multivariate distribution. As such, the set of multivariate normal distributions representing the different clusters also form a GMM. This GMM is referred to as a hash GMM which can be considered to a lower resolution representation of the GMM-UBM. The mapping that associates the components of the hash GMM with components of the original GMM-UBM is referred to as a shortlist. This research investigates various methods of clustering the components of the GMM-UBM and forming hash GMMs. Of five different methods that are presented one method, Gaussian mixture reduction as proposed by Runnall's, easily outperformed the other methods. This method of Gaussian reduction iteratively reduces the size of a GMM by successively merging pairs of component densities. Pairs are selected for merger by using a Kullback-Leibler based metric. Using Runnal's method of reduction, we

  5. Identifying patterns in treatment response profiles in acute bipolar mania: a cluster analysis approach

    Directory of Open Access Journals (Sweden)

    Houston John P

    2008-07-01

    Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment

  6. Semi-Supervised Generation with Cluster-aware Generative Models

    DEFF Research Database (Denmark)

    Maaløe, Lars; Fraccaro, Marco; Winther, Ole

    2017-01-01

    Deep generative models trained with large amounts of unlabelled data have proven to be powerful within the domain of unsupervised learning. Many real life data sets contain a small amount of labelled data points, that are typically disregarded when training generative models. We propose the Cluster...... a log-likelihood of −79.38 nats on permutation invariant MNIST, while also achieving competitive semi-supervised classification accuracies. The model can also be trained fully unsupervised, and still improve the log-likelihood performance with respect to related methods....

  7. Confronting the outflow-regulated cluster formation model with observations

    Energy Technology Data Exchange (ETDEWEB)

    Nakamura, Fumitaka [National Astronomical Observatory, Mitaka, Tokyo 181-8588 (Japan); Li, Zhi-Yun, E-mail: fumitaka.nakamura@nao.ac.jp, E-mail: zl4h@virginia.edu [Department of Astronomy, University of Virginia, P.O. Box 400325, Charlottesville, VA 22904 (United States)

    2014-03-10

    Protostellar outflows have been shown theoretically to be capable of maintaining supersonic turbulence in cluster-forming clumps and keeping the star formation rate per free-fall time as low as a few percent. We aim to test two basic predictions of this outflow-regulated cluster formation model, namely, (1) the clump should be close to virial equilibrium and (2) the turbulence dissipation rate should be balanced by the outflow momentum injection rate, using recent outflow surveys toward eight nearby cluster-forming clumps (B59, L1551, L1641N, Serpens Main Cloud, Serpens South, ρ Oph, IC 348, and NGC 1333). We find, for almost all sources, that the clumps are close to virial equilibrium and the outflow momentum injection rate exceeds the turbulence momentum dissipation rate. In addition, the outflow kinetic energy is significantly smaller than the clump gravitational energy for intermediate and massive clumps with M {sub cl} ≳ a few × 10{sup 2} M {sub ☉}, suggesting that the outflow feedback is not enough to disperse the clump as a whole. The number of observed protostars also indicates that the star formation rate per free-fall time is as small as a few percent for all clumps. These observationally based results strengthen the case for outflow-regulated cluster formation.

  8. Clustering Multivariate Time Series Using Hidden Markov Models

    Directory of Open Access Journals (Sweden)

    Shima Ghassempour

    2014-03-01

    Full Text Available In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs, where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.

  9. Cluster Dynamics Modeling with Bubble Nucleation, Growth and Coalescence

    Energy Technology Data Exchange (ETDEWEB)

    de Almeida, Valmor F. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Blondel, Sophie [Univ. of Tennessee, Knoxville, TN (United States); Bernholdt, David E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Wirth, Brian D. [Univ. of Tennessee, Knoxville, TN (United States)

    2017-06-01

    The topic of this communication pertains to defect formation in irradiated solids such as plasma-facing tungsten submitted to helium implantation in fusion reactor com- ponents, and nuclear fuel (metal and oxides) submitted to volatile ssion product generation in nuclear reactors. The purpose of this progress report is to describe ef- forts towards addressing the prediction of long-time evolution of defects via continuum cluster dynamics simulation. The di culties are twofold. First, realistic, long-time dynamics in reactor conditions leads to a non-dilute di usion regime which is not accommodated by the prevailing dilute, stressless cluster dynamics theory. Second, long-time dynamics calls for a large set of species (ideally an in nite set) to capture all possible emerging defects, and this represents a computational bottleneck. Extensions beyond the dilute limit is a signi cant undertaking since no model has been advanced to extend cluster dynamics to non-dilute, deformable conditions. Here our proposed approach to model the non-dilute limit is to monitor the appearance of a spatially localized void volume fraction in the solid matrix with a bell shape pro le and insert an explicit geometrical bubble onto the support of the bell function. The newly cre- ated internal moving boundary provides the means to account for the interfacial ux of mobile species into the bubble, and the growth of bubbles allows for coalescence phenomena which captures highly non-dilute interactions. We present a preliminary interfacial kinematic model with associated interfacial di usion transport to follow the evolution of the bubble in any number of spatial dimensions and any number of bubbles, which can be further extended to include a deformation theory. Finally we comment on a computational front-tracking method to be used in conjunction with conventional cluster dynamics simulations in the non-dilute model proposed.

  10. Efficient image duplicated region detection model using sequential block clustering

    Czech Academy of Sciences Publication Activity Database

    Sekeh, M. A.; Maarof, M. A.; Rohani, M. F.; Mahdian, Babak

    2013-01-01

    Roč. 10, č. 1 (2013), s. 73-84 ISSN 1742-2876 Institutional support: RVO:67985556 Keywords : Image forensic * Copy–paste forgery * Local block matching Subject RIV: IN - Informatics, Computer Science Impact factor: 0.986, year: 2013 http://library.utia.cas.cz/separaty/2013/ZOI/mahdian-efficient image duplicated region detection model using sequential block clustering.pdf

  11. Improving estimation of kinetic parameters in dynamic force spectroscopy using cluster analysis

    Science.gov (United States)

    Yen, Chi-Fu; Sivasankar, Sanjeevi

    2018-03-01

    Dynamic Force Spectroscopy (DFS) is a widely used technique to characterize the dissociation kinetics and interaction energy landscape of receptor-ligand complexes with single-molecule resolution. In an Atomic Force Microscope (AFM)-based DFS experiment, receptor-ligand complexes, sandwiched between an AFM tip and substrate, are ruptured at different stress rates by varying the speed at which the AFM-tip and substrate are pulled away from each other. The rupture events are grouped according to their pulling speeds, and the mean force and loading rate of each group are calculated. These data are subsequently fit to established models, and energy landscape parameters such as the intrinsic off-rate (koff) and the width of the potential energy barrier (xβ) are extracted. However, due to large uncertainties in determining mean forces and loading rates of the groups, errors in the estimated koff and xβ can be substantial. Here, we demonstrate that the accuracy of fitted parameters in a DFS experiment can be dramatically improved by sorting rupture events into groups using cluster analysis instead of sorting them according to their pulling speeds. We test different clustering algorithms including Gaussian mixture, logistic regression, and K-means clustering, under conditions that closely mimic DFS experiments. Using Monte Carlo simulations, we benchmark the performance of these clustering algorithms over a wide range of koff and xβ, under different levels of thermal noise, and as a function of both the number of unbinding events and the number of pulling speeds. Our results demonstrate that cluster analysis, particularly K-means clustering, is very effective in improving the accuracy of parameter estimation, particularly when the number of unbinding events are limited and not well separated into distinct groups. Cluster analysis is easy to implement, and our performance benchmarks serve as a guide in choosing an appropriate method for DFS data analysis.

  12. Riemannian multi-manifold modeling and clustering in brain networks

    Science.gov (United States)

    Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.

    2017-08-01

    This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.

  13. Performance Analysis of a Cluster-Based MAC Protocol for Wireless Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Kartsakli Elli

    2010-01-01

    Full Text Available An analytical model to evaluate the non-saturated performance of the Distributed Queuing Medium Access Control Protocol for Ad Hoc Networks (DQMANs in single-hop networks is presented in this paper. DQMAN is comprised of a spontaneous, temporary, and dynamic clustering mechanism integrated with a near-optimum distributed queuing Medium Access Control (MAC protocol. Clustering is executed in a distributed manner using a mechanism inspired by the Distributed Coordination Function (DCF of the IEEE 802.11. Once a station seizes the channel, it becomes the temporary clusterhead of a spontaneous cluster and it coordinates the peer-to-peer communications between the clustermembers. Within each cluster, a near-optimum distributed queuing MAC protocol is executed. The theoretical performance analysis of DQMAN in single-hop networks under non-saturation conditions is presented in this paper. The approach integrates the analysis of the clustering mechanism into the MAC layer model. Up to the knowledge of the authors, this approach is novel in the literature. In addition, the performance of an ad hoc network using DQMAN is compared to that obtained when using the DCF of the IEEE 802.11, as a benchmark reference.

  14. Poisson cluster analysis of cardiac arrest incidence in Columbus, Ohio.

    Science.gov (United States)

    Warden, Craig; Cudnik, Michael T; Sasson, Comilla; Schwartz, Greg; Semple, Hugh

    2012-01-01

    Scarce resources in disease prevention and emergency medical services (EMS) need to be focused on high-risk areas of out-of-hospital cardiac arrest (OHCA). Cluster analysis using geographic information systems (GISs) was used to find these high-risk areas and test potential predictive variables. This was a retrospective cohort analysis of EMS-treated adults with OHCAs occurring in Columbus, Ohio, from April 1, 2004, through March 31, 2009. The OHCAs were aggregated to census tracts and incidence rates were calculated based on their adult populations. Poisson cluster analysis determined significant clusters of high-risk census tracts. Both census tract-level and case-level characteristics were tested for association with high-risk areas by multivariate logistic regression. A total of 2,037 eligible OHCAs occurred within the city limits during the study period. The mean incidence rate was 0.85 OHCAs/1,000 population/year. There were five significant geographic clusters with 76 high-risk census tracts out of the total of 245 census tracts. In the case-level analysis, being in a high-risk cluster was associated with a slightly younger age (-3 years, adjusted odds ratio [OR] 0.99, 95% confidence interval [CI] 0.99-1.00), not being white, non-Hispanic (OR 0.54, 95% CI 0.45-0.64), cardiac arrest occurring at home (OR 1.53, 95% CI 1.23-1.71), and not receiving bystander cardiopulmonary resuscitation (CPR) (OR 0.77, 95% CI 0.62-0.96), but with higher survival to hospital discharge (OR 1.78, 95% CI 1.30-2.46). In the census tract-level analysis, high-risk census tracts were also associated with a slightly lower average age (-0.1 years, OR 1.14, 95% CI 1.06-1.22) and a lower proportion of white, non-Hispanic patients (-0.298, OR 0.04, 95% CI 0.01-0.19), but also a lower proportion of high-school graduates (-0.184, OR 0.00, 95% CI 0.00-0.00). This analysis identified high-risk census tracts and associated census tract-level and case-level characteristics that can be used to

  15. Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method

    DEFF Research Database (Denmark)

    Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels

    2014-01-01

    method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...... the probability distribution of the Y-STR haplotypes. Creating a consistent statistical model of the haplotypes enables us to perform a wide range of analyses. Previously, haplotype frequency estimation using the discrete Laplace method has been validated. In this paper we investigate how the discrete Laplace...

  16. Performance Based Clustering for Benchmarking of Container Ports: an Application of Dea and Cluster Analysis Technique

    Directory of Open Access Journals (Sweden)

    Jie Wu

    2010-12-01

    Full Text Available The operational performance of container ports has received more and more attentions in both academic and practitioner circles, the performance evaluation and process improvement of container ports have also been the focus of several studies. In this paper, Data Envelopment Analysis (DEA, an effective tool for relative efficiency assessment, is utilized for measuring the performances and benchmarking of the 77 world container ports in 2007. The used approaches in the current study consider four inputs (Capacity of Cargo Handling Machines, Number of Berths, Terminal Area and Storage Capacity and a single output (Container Throughput. The results for the efficiency scores are analyzed, and a unique ordering of the ports based on average cross efficiency is provided, also cluster analysis technique is used to select the more appropriate targets for poorly performing ports to use as benchmarks.

  17. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  18. Analysis of risk factors for cluster behavior of dental implant failures.

    Science.gov (United States)

    Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann

    2017-08-01

    Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.

  19. Bilingual Cluster Based Models for Statistical Machine Translation

    Science.gov (United States)

    Yamamoto, Hirofumi; Sumita, Eiichiro

    We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

  20. Fractal Segmentation and Clustering Analysis for Seismic Time Slices

    Science.gov (United States)

    Ronquillo, G.; Oleschko, K.; Korvin, G.; Arizabalo, R. D.

    2002-05-01

    Fractal analysis has become part of the standard approach for quantifying texture on gray-tone or colored images. In this research we introduce a multi-stage fractal procedure to segment, classify and measure the clustering patterns on seismic time slices from a 3-D seismic survey. Five fractal classifiers (c1)-(c5) were designed to yield standardized, unbiased and precise measures of the clustering of seismic signals. The classifiers were tested on seismic time slices from the AKAL field, Cantarell Oil Complex, Mexico. The generalized lacunarity (c1), fractal signature (c2), heterogeneity (c3), rugosity of boundaries (c4) and continuity resp. tortuosity (c5) of the clusters are shown to be efficient measures of the time-space variability of seismic signals. The Local Fractal Analysis (LFA) of time slices has proved to be a powerful edge detection filter to detect and enhance linear features, like faults or buried meandering rivers. The local fractal dimensions of the time slices were also compared with the self-affinity dimensions of the corresponding parts of porosity-logs. It is speculated that the spectral dimension of the negative-amplitude parts of the time-slice yields a measure of connectivity between the formation's high-porosity zones, and correlates with overall permeability.

  1. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-11-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  2. Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach

    Directory of Open Access Journals (Sweden)

    Matúš Horváth

    2012-10-01

    Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.

  3. Validation of hierarchical cluster analysis for identification of bacterial species using 42 bacterial isolates

    Science.gov (United States)

    Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.

    2015-03-01

    Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.

  4. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained

  5. Equilibrium Models of Galaxy Clusters with Cooling, Heating, and Conduction

    Science.gov (United States)

    Brüggen, M.

    2003-08-01

    It is generally argued that most clusters of galaxies host cooling flows in which radiative cooling in the center causes a slow inflow. However, recent observations by Chandra and XMM conflict with the predicted cooling flow rates. Among other mechanisms, heating by a central active galactic nucleus and thermal conduction have been invoked in order to account for the small mass deposition rates. Here we present a family of hydrostatic models for the intracluster medium where radiative losses are exactly balanced by thermal conduction and heating by a central source. We describe the features of this simple model and fit its parameters to the density and temperature profiles of Hydra A.

  6. Halo Occupation Distribution Modeling of Clustering of Luminous Red Galaxies

    OpenAIRE

    Zheng, Zheng; Zehavi, Idit; Eisenstein, Daniel J.; Weinberg, David H.; Jing, Y. P.

    2008-01-01

    We perform Halo Occupation Distribution (HOD) modeling to interpret small-scale and intermediate-scale clustering of 35,000 luminous early-type galaxies and their cross-correlation with a reference imaging sample of normal L* galaxies in the Sloan Digital Sky Survey. The modeling results show that most of these luminous red galaxies (LRGs) are central galaxies residing in massive halos of typical mass M ~ a few times 10^13 to 10^14 Msun/h, while a few percent of them have to be satellites wit...

  7. Application of Gene Shaving and Mixture Models to Cluster Microarray Gene Expression Data

    Directory of Open Access Journals (Sweden)

    S. Wen

    2007-01-01

    Full Text Available Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid leukemia (AML and acute lymphoblastic leukaemia (ALL, for both methods. In addition, we also identify genes specifi c for the subgroup of ALL-T cell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the

  8. Market analysis of Serbia's raspberry sector and cluster development initiatives

    Directory of Open Access Journals (Sweden)

    Paraušić Vesna

    2016-01-01

    Full Text Available Authors analyze competitive strength and weakness of raspberry producers in Serbia and propose key prerequisites of which fulfilling will depend develop of successful cluster initiative in Serbian raspberry sector. The research results indicate that Serbian raspberry growers can develop successful cluster and they can keep leading position in the global market of raspberries, only with following many assumptions, like: (a better organized marketing channel through the vertically and horizontal integration of all actors in this sector,(b strengthening specialized cooperatives for raspberry production and associations of raspberry growers, and in the future setting up of producer organizations and associations; (c inclusion of producers of other berries and producers of processed berries; (d introducing innovations, scientific knowledge, and research and development in production, processing, packing, logistics, export of raspberries, etc. An analysis is based on case study in Šumadija and Western Serbia region, which is major region in raspberry production in Serbia.

  9. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  10. CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

    Science.gov (United States)

    Franke, R.

    2016-11-01

    In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.

  11. Steady state subchannel analysis of AHWR fuel cluster

    International Nuclear Information System (INIS)

    Dasgupta, A.; Chandraker, D.K.; Vijayan, P.K.; Saha, D.

    2006-09-01

    Subchannel analysis is a technique used to predict the thermal hydraulic behavior of reactor fuel assemblies. The rod cluster is subdivided into a number of parallel interacting flow subchannels. The conservation equations are solved for each of these subchannels, taking into account subchannel interactions. Subchannel analysis of AHWR D-5 fuel cluster has been carried out to determine the variations in thermal hydraulic conditions of coolant and fuel temperatures along the length of the fuel bundle. The hottest regions within the AHWR fuel bundle have been identified. The effect of creep on the fuel performance has also been studied. MCHFR has been calculated using Jansen-Levy correlation. The calculations have been backed by sensitivity analysis for parameters whose values are not known accurately. The sensitivity analysis showed the calculations to have a very low sensitivity to these parameters. Apart from the analysis, the report also includes a brief introduction of a few subchannel codes. A brief description of the equations and solution methodology used in COBRA-IIIC and COBRA-IV-I is also given. (author)

  12. CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES

    Directory of Open Access Journals (Sweden)

    J. Shen

    2015-07-01

    Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.

  13. Evolutionary-Hierarchical Bases of the Formation of Cluster Model of Innovation Economic Development

    Directory of Open Access Journals (Sweden)

    Yuliya Vladimirovna Dubrovskaya

    2016-10-01

    Full Text Available The functioning of a modern economic system is based on the interaction of objects of different hierarchical levels. Thus, the problem of the study of innovation processes taking into account the mutual influence of the activities of these economic actors becomes important. The paper dwells evolutionary basis for the formation of models of innovation development on the basis of micro and macroeconomic analysis. Most of the concepts recognized that despite a big number of diverse models, the coordination of the relations between economic agents is of crucial importance for the successful innovation development. According to the results of the evolutionary-hierarchical analysis, the authors reveal key phases of the development of forms of business cooperation, science and government in the domestic economy. It has become the starting point of the conception of the characteristics of the interaction in the cluster models of innovation development of the economy. Considerable expectancies on improvement of the national innovative system are connected with the development of cluster and network structures. The main objective of government authorities is the formation of mechanisms and institutions that will foster cooperation between members of the clusters. The article explains that the clusters cannot become the factors in the growth of the national economy, not being an effective tool for interaction between the actors of the regional innovative systems.

  14. Number of Clusters and the Quality of Hybrid Predictive Models in Analytical CRM

    Directory of Open Access Journals (Sweden)

    Łapczyński Mariusz

    2014-08-01

    Full Text Available Making more accurate marketing decisions by managers requires building effective predictive models. Typically, these models specify the probability of customer belonging to a particular category, group or segment. The analytical CRM categories refer to customers interested in starting cooperation with the company (acquisition models, customers who purchase additional products (cross- and up-sell models or customers intending to resign from the cooperation (churn models. During building predictive models researchers use analytical tools from various disciplines with an emphasis on their best performance. This article attempts to build a hybrid predictive model combining decision trees (C&RT algorithm and cluster analysis (k-means. During experiments five different cluster validity indices and eight datasets were used. The performance of models was evaluated by using popular measures such as: accuracy, precision, recall, G-mean, F-measure and lift in the first and in the second decile. The authors tried to find a connection between the number of clusters and models' quality.

  15. [The hierarchical clustering analysis of hyperspectral image based on probabilistic latent semantic analysis].

    Science.gov (United States)

    Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong

    2011-09-01

    The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.

  16. A Global Model for Circumgalactic and Cluster-core Precipitation

    Science.gov (United States)

    Voit, G. Mark; Meece, Greg; Li, Yuan; O'Shea, Brian W.; Bryan, Greg L.; Donahue, Megan

    2017-08-01

    We provide an analytic framework for interpreting observations of multiphase circumgalactic gas that is heavily informed by recent numerical simulations of thermal instability and precipitation in cool-core galaxy clusters. We start by considering the local conditions required for the formation of multiphase gas via two different modes: (1) uplift of ambient gas by galactic outflows, and (2) condensation in a stratified stationary medium in which thermal balance is explicitly maintained. Analytic exploration of these two modes provides insights into the relationships between the local ratio of the cooling and freefall timescales (I.e., {t}{cool}/{t}{ff}), the large-scale gradient of specific entropy, and the development of precipitation and multiphase media in circumgalactic gas. We then use these analytic findings to interpret recent simulations of circumgalactic gas in which global thermal balance is maintained. We show that long-lasting configurations of gas with 5≲ \\min ({t}{cool}/{t}{ff})≲ 20 and radial entropy profiles similar to observations of cool cores in galaxy clusters are a natural outcome of precipitation-regulated feedback. We conclude with some observational predictions that follow from these models. This work focuses primarily on precipitation and AGN feedback in galaxy-cluster cores, because that is where the observations of multiphase gas around galaxies are most complete. However, many of the physical principles that govern condensation in those environments apply to circumgalactic gas around galaxies of all masses.

  17. Segmentation of Residential Gas Consumers Using Clustering Analysis

    Directory of Open Access Journals (Sweden)

    Marta P. Fernandes

    2017-12-01

    Full Text Available The growing environmental concerns and liberalization of energy markets have resulted in an increased competition between utilities and a strong focus on efficiency. To develop new energy efficiency measures and optimize operations, utilities seek new market-related insights and customer engagement strategies. This paper proposes a clustering-based methodology to define the segmentation of residential gas consumers. The segments of gas consumers are obtained through a detailed clustering analysis using smart metering data. Insights are derived from the segmentation, where the segments result from the clustering process and are characterized based on the consumption profiles, as well as according to information regarding consumers’ socio-economic and household key features. The study is based on a sample of approximately one thousand households over one year. The representative load profiles of consumers are essentially characterized by two evident consumption peaks, one in the morning and the other in the evening, and an off-peak consumption. Significant insights can be derived from this methodology regarding typical consumption curves of the different segments of consumers in the population. This knowledge can assist energy utilities and policy makers in the development of consumer engagement strategies, demand forecasting tools and in the design of more sophisticated tariff systems.

  18. IGSA: Individual Gene Sets Analysis, including Enrichment and Clustering.

    Science.gov (United States)

    Wu, Lingxiang; Chen, Xiujie; Zhang, Denan; Zhang, Wubing; Liu, Lei; Ma, Hongzhe; Yang, Jingbo; Xie, Hongbo; Liu, Bo; Jin, Qing

    2016-01-01

    Analysis of gene sets has been widely applied in various high-throughput biological studies. One weakness in the traditional methods is that they neglect the heterogeneity of genes expressions in samples which may lead to the omission of some specific and important gene sets. It is also difficult for them to reflect the severities of disease and provide expression profiles of gene sets for individuals. We developed an application software called IGSA that leverages a powerful analytical capacity in gene sets enrichment and samples clustering. IGSA calculates gene sets expression scores for each sample and takes an accumulating clustering strategy to let the samples gather into the set according to the progress of disease from mild to severe. We focus on gastric, pancreatic and ovarian cancer data sets for the performance of IGSA. We also compared the results of IGSA in KEGG pathways enrichment with David, GSEA, SPIA, ssGSEA and analyzed the results of IGSA clustering and different similarity measurement methods. Notably, IGSA is proved to be more sensitive and specific in finding significant pathways, and can indicate related changes in pathways with the severity of disease. In addition, IGSA provides with significant gene sets profile for each sample.

  19. Analysis of Learning Development With Sugeno Fuzzy Logic And Clustering

    Directory of Open Access Journals (Sweden)

    Maulana Erwin Saputra

    2017-06-01

    Full Text Available In the first journal, I made this attempt to analyze things that affect the achievement of students in each school of course vary. Because students are one of the goals of achieving the goals of successful educational organizations. The mental influence of students’ emotions and behaviors themselves in relation to learning performance. Fuzzy logic can be used in various fields as well as Clustering for grouping, as in Learning Development analyzes. The process will be performed on students based on the symptoms that exist. In this research will use fuzzy logic and clustering. Fuzzy is an uncertain logic but its excess is capable in the process of language reasoning so that in its design is not required complicated mathematical equations. However Clustering method is K-Means method is method where data analysis is broken down by group k (k = 1,2,3, .. k. To know the optimal number of Performance group. The results of the research is with a questionnaire entered into matlab will produce a value that means in generating the graph. And simplify the school in seeing Student performance in the learning process by using certain criteria. So from the system that obtained the results for a decision-making required by the school.

  20. Visualizing dynamical neural assemblies with a fuzzy synchronization clustering analysis.

    Science.gov (United States)

    Zhou, Shu; Wu, Yan; Dos Santos, Claudia C

    2009-12-01

    Phase synchrony has been proposed as a possible communication mechanism between cerebral regions. The participation index method (PIM) may be used to investigate integrating structures within an oscillatory network, based on the eigenvalue decomposition of matrix of bivariate synchronization indices. However, eigenvector orthogonality between clusters may result in categorization difficulties for hub oscillators and pseudoclustering phenomenon. Here, we propose a method of fuzzy synchronization clustering analysis (FSCA) to avoid the constraint of orthogonality by combining the fuzzy c-means algorithm with the phase-locking value. Following mathematical derivation, we cross-validated the FSCA and the PIM using the same multichannel phase time series of event-related EEG from a subject performing a working memory task. Both clustering methods produced consistent findings for the qualitatively salient configuration of the original network-illustrated here by a visualization technique. In contrast to PIM, use of common virtual oscillatory centroids enabled the FSCA to reveal multiple dynamical neural assemblies as well as the unitary phase information within each assembly.

  1. Clinical Implications of Cluster Analysis-Based Classification of Acute Decompensated Heart Failure and Correlation with Bedside Hemodynamic Profiles.

    Directory of Open Access Journals (Sweden)

    Tariq Ahmad

    Full Text Available Classification of acute decompensated heart failure (ADHF is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies.To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside.We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry. We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data.We identified four advanced HF clusters: 1 male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP levels; 2 females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3 young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4 older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70. For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not.By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using

  2. Using cluster analysis to examine dietary patterns: nutrient intakes, gender, and weight status differ across food pattern clusters.

    Science.gov (United States)

    Wirfält, A K; Jeffery, R W

    1997-03-01

    This study explored the usefulness of cluster analysis in identifying food choice patterns of three groups of adults in relation to their energy intake. Food frequency data were converted to percentage of total energy from 38 food groups and entered into a cluster analysis procedure. Subjects in the emerging food group patterns were compared in terms of weight status, demographics, and the nutrition composition of their usual diet. Data were collected as part of three studies in two US metropolitan areas using identical protocols. Participants were university employees (103 women and 99 men) who volunteered for a reliability study of health behavior questionnaires and moderately obese volunteers (223 women and 101 men) to two weight-loss studies who were recruited by newspaper advertisements. Subjects were clustered according to food energy sources using the FASTCLUS procedure in the Statistical Analysis System. One-way analysis of variance and chi 2 analysis were then performed to compared the weight status, nutrient intakes, and demographics of the food patterns. Six food pattern clusters were identified. Subjects in the two clusters associated with high consumption of pastry and meat had significantly higher fat intakes (P = .0001). Subjects in two other clusters, those associated with high intake of skim milk and a broad distribution of energy sources had significantly higher micronutrient levels (P = .0001). Body mass index and the distribution of gender were also significantly different across clusters. The success of cluster analysis in identifying dietary exposure categories with unique demographic and nutritional correlates suggests that the approach may be useful in epidemiologic studies that examine conditions such as obesity, and in the design of nutrition interventions.

  3. Clustering, advection, and patterns in a model of population dynamics with neighborhood-dependent rates

    Science.gov (United States)

    Hernández-García, Emilio; López, Cristóbal

    2004-07-01

    We introduce a simple model of population dynamics which considers reproducing individuals or particles with birth and death rates depending on the number of other individuals in their neighborhood. The model shows an inhomogeneous quasistationary pattern with many different clusters of particles arranged periodically in space. We derive the equation for the macroscopic density of particles, perform a linear stability analysis on it, and show that there is a finite-wavelength instability leading to pattern formation. This is responsible for the approximate periodicity with which the clusters of particles arrange in the microscopic model. In addition, we consider the population when immersed in a fluid medium and analyze the influence of advection on global properties of the model, such as the average number of individuals.

  4. Contour Cluster Shape Analysis for Building Damage Detection from Post-earthquake Airborne LiDAR

    Directory of Open Access Journals (Sweden)

    HE Meizhang

    2015-04-01

    Full Text Available Detection of the damaged building is the obligatory step prior to evaluate earthquake casualty and economic losses. It's very difficult to detect damaged buildings accurately based on the assumption that intact roofs appear in laser data as large planar segments whereas collapsed roofs are characterized by many small segments. This paper presents a contour cluster shape similarity analysis algorithm for reliable building damage detection from the post-earthquake airborne LiDAR point cloud. First we evaluate the entropies of shape similarities between all the combinations of two contour lines within a building cluster, which quantitatively describe the shape diversity. Then the maximum entropy model is employed to divide all the clusters into intact and damaged classes. The tests on the LiDAR data at El Mayor-Cucapah earthquake rupture prove the accuracy and reliability of the proposed method.

  5. Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

    Science.gov (United States)

    Muraoka, Masae; Okuda, Hiroshi

    With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.

  6. Coupled dynamic analysis of a single gimbal control moment gyro cluster integrated with an isolation system

    Science.gov (United States)

    Luo, Qing; Li, Dongxu; Jiang, Jianping

    2014-01-01

    Control moment gyros (CMGs) are widely used as actuators for attitude control in spacecraft. However, micro-vibrations produced by CMGs will degrade the pointing performance of high-sensitivity instruments on-board the spacecraft. This paper addresses dynamic modelling and performs an analysis on the micro-vibration isolation for a single gimbal CMG (SGCMG) cluster. First, an analytical model was developed to describe both the coupled SGCMG cluster and the multi-axis isolation system that can express the dynamic outputs. This analytical model accurately reflects the mass and inertia properties, the gyroscopic effects and flexible modes of the coupled system, which can be generalized for isolation applications of SGCMG clusters. Second, the analytical model was validated using MSC.NASTRAN software based on the finite element technique. The dynamic characteristics of the coupled system are affected by the mass distribution and the gyroscopic effects of the SGCMGs. The gyroscopic effects produced by the rotary flywheel will stiffen or soften several of the structural modes of the coupled system. In addition, the gyroscopic effect of each SGCMG can interact with or counteract that of others, which induce vibration modes coupled together. Finally, the performance of the passive isolation was analysed. It was demonstrated that the gyroscopic effects should be considered in isolation studies on SGCMG clusters; otherwise, the isolation performance will be underestimated if they are ignored.

  7. Cluster analysis applied to localized dispersion curves in East Asia: the limits of surface wave resolution

    Science.gov (United States)

    Witek, M.; van der Lee, S.; Kang, T. S.; Chang, S. J.; Ning, J.; Ning, S.

    2017-12-01

    We have measured Rayleigh wave group velocity dispersion curves from one year of station-pair cross-correlations of continuous vertical-component broadband data from 1082 seismic stations in regional networks across China, Korea, Taiwan, and Japan for the year 2011. We use the measurements to map local dispersion anomalies for periods in the range 6-40 s. We combined our ambient noise data set with the earthquake group velocity data set of Ma et al. (2014), and then applied agglomerative hierarchical clustering to the localized dispersion curves. We find that the dispersion curves naturally organize themselves into distinct tectonic regions. For our distribution of interstation distances, only 8 distinct regions need to be defined. Additional clusters reduce the overall data misfit by increasingly smaller amounts. The size and number of clusters needed to suitably predict the data may give an indication of the resolving power of the data set. The regions that emerge from the cluster analysis include Tibet, the Sea of Japan, the South China Block and the Korean peninsula, the Ordos and Yangtze cratons, and Mesozoic rift basins such as the Songliao, Bohai Bay and Ulleung basins. We also performed a traditional inversion for 3D S-velocity structure, and the resulting model fits the data as well as the 8-cluster model, while both models fit the earthquake data and ambient noise data better than the LITHO1.0 model of Pasyanos et al. (2014). Our 3D model of the crust and upper mantle confirms many of the features seen in previous studies of the region, most notably the lithospheric thinning going from west to east and low velocity zones in the crust on the Tibetan periphery. We conclude that cluster analysis is able to greatly reduce the dimensionality of surface wave dispersion data, in the sense that a data set of over half a million dispersion curves is sufficiently predicted by appropriately averaging over a relatively small set of distinct tectonic regions. The

  8. Minimum Information Loss Cluster Analysis for Cathegorical Data

    Czech Academy of Sciences Publication Activity Database

    Grim, Jiří; Hora, Jan

    2007-01-01

    Roč. 2007, Č. 4571 (2007), s. 233-247 ISSN 0302-9743. [International Conference on Machine Learning and Data Mining MLDM 2007 /5./. Leipzig, 18.07.2007-20.07.2007] R&D Projects: GA MŠk 1M0572; GA ČR GA102/07/1594 Grant - others:GA MŠk(CZ) 2C06019 Institutional research plan: CEZ:AV0Z10750506 Keywords : Cluster Analysis * Cathegorical Data * EM algorithm Subject RIV: BD - The ory of Information Impact factor: 0.402, year: 2005

  9. Analysis of clusterization and networking processes in developing intermodal transportation

    Directory of Open Access Journals (Sweden)

    Sinkevičius Gintaras

    2016-06-01

    Full Text Available Analysis of the processes of clusterization and networking draws attention to the necessity of integration of railway transport into the intermodal or multimodal transport chain. One of the most widespread methods of combined transport is interoperability of railway and road transport. The objective is to create an uninterrupted transport chain in combining several modes of transport. The aim of this is to save energy resources, to form an effective, competitive, attractive to the client and safe and environmentally friendly transport system.

  10. A first packet processing subdomain cluster model based on SDN

    Science.gov (United States)

    Chen, Mingyong; Wu, Weimin

    2017-08-01

    For the current controller cluster packet processing performance bottlenecks and controller downtime problems. An SDN controller is proposed to allocate the priority of each device in the SDN (Software Defined Network) network, and the domain contains several network devices and Controller, the controller is responsible for managing the network equipment within the domain, the switch performs data delivery based on the load of the controller, processing network equipment data. The experimental results show that the model can effectively solve the risk of single point failure of the controller, and can solve the performance bottleneck of the first packet processing.

  11. A cluster analysis on road traffic accidents using genetic algorithms

    Science.gov (United States)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  12. Multiscale deep drawing analysis of dual-phase steels using grain cluster-based RGC scheme

    International Nuclear Information System (INIS)

    Tjahjanto, D D; Eisenlohr, P; Roters, F

    2015-01-01

    Multiscale modelling and simulation play an important role in sheet metal forming analysis, since the overall material responses at macroscopic engineering scales, e.g. formability and anisotropy, are strongly influenced by microstructural properties, such as grain size and crystal orientations (texture). In the present report, multiscale analysis on deep drawing of dual-phase steels is performed using an efficient grain cluster-based homogenization scheme.The homogenization scheme, called relaxed grain cluster (RGC), is based on a generalization of the grain cluster concept, where a (representative) volume element consists of p  ×  q  ×  r (hexahedral) grains. In this scheme, variation of the strain or deformation of individual grains is taken into account through the, so-called, interface relaxation, which is formulated within an energy minimization framework. An interfacial penalty term is introduced into the energy minimization framework in order to account for the effects of grain boundaries.The grain cluster-based homogenization scheme has been implemented and incorporated into the advanced material simulation platform DAMASK, which purposes to bridge the macroscale boundary value problems associated with deep drawing analysis to the micromechanical constitutive law, e.g. crystal plasticity model. Standard Lankford anisotropy tests are performed to validate the model parameters prior to the deep drawing analysis. Model predictions for the deep drawing simulations are analyzed and compared to the corresponding experimental data. The result shows that the predictions of the model are in a very good agreement with the experimental measurement. (paper)

  13. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

    Science.gov (United States)

    Shen, Chung-Wei; Chen, Yi-Hau

    2018-03-13

    We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

  14. A Distributed Agent Implementation of Multiple Species Flocking Model for Document Partitioning Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the Kmeans or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a given set of data through the embedding of the highdimensional data items on a two-dimensional grid for efficient clustering result retrieval and visualization. In this paper, we propose a bio-inspired clustering model, the Multiple Species Flocking clustering model (MSF), and present a distributed multi-agent MSF approach for document clustering.

  15. Physicochemical properties of different corn varieties by principal components analysis and cluster analysis

    International Nuclear Information System (INIS)

    Zeng, J.; Li, G.; Sun, J.

    2013-01-01

    Principal components analysis and cluster analysis were used to investigate the properties of different corn varieties. The chemical compositions and some properties of corn flour which processed by drying milling were determined. The results showed that the chemical compositions and physicochemical properties were significantly different among twenty six corn varieties. The quality of corn flour was concerned with five principal components from principal component analysis and the contribution rate of starch pasting properties was important, which could account for 48.90%. Twenty six corn varieties could be classified into four groups by cluster analysis. The consistency between principal components analysis and cluster analysis indicated that multivariate analyses were feasible in the study of corn variety properties. (author)

  16. Cluster Analysis of the Wind Events and Seasonal Wind Circulation Patterns in the Mexico City Region

    Directory of Open Access Journals (Sweden)

    Susana Carreón-Sierra

    2015-07-01

    Full Text Available The residents of Mexico City face serious problems of air pollution. Identifying the most representative scenarios for the transport and dispersion of air pollutants requires the knowledge of the main wind circulation patterns. In this paper, a simple method to recognize and characterize the wind circulation patterns in a given region is proposed and applied to the Mexico City winds (2001–2006. This method uses a lattice wind approach to model the local wind events at the meso-β scale, and hierarchical cluster analysis to recognize their agglomerations in their phase space. Data of the meteorological network of Mexico City was used as input for the lattice wind model. The Ward’s clustering algorithm with Euclidean distance was applied to organize the model wind events in seasonal clusters for each year of the period. Comparison of the hourly population trends of these clusters permitted the recognition and detailed description of seven circulation patterns. These patterns resemble the qualitative descriptions of the Mexico City wind circulation modes reported by other authors. Our method, however, permitted also their quantitative characterization in terms of the wind attributes of velocity, divergence and vorticity, and an estimation of their seasonal and annual occurrence probabilities, which never before were quantified.

  17. Shape Analysis of HII Regions - I. Statistical Clustering

    Science.gov (United States)

    Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred

    2018-04-01

    We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.

  18. Cluster analysis of autoantibodies in 852 patients with systemic lupus erythematosus from a single center.

    Science.gov (United States)

    Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat

    2014-07-01

    Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.

  19. [Typologies of Madrid's citizens (Spain) at the end-of-life: cluster analysis].

    Science.gov (United States)

    Ortiz-Gonçalves, Belén; Perea-Pérez, Bernardo; Labajo González, Elena; Albarrán Juan, Elena; Santiago-Sáez, Andrés

    2018-03-06

    To establish typologies within Madrid's citizens (Spain) with regard to end-of-life by cluster analysis. The SPAD 8 programme was implemented in a sample from a health care centre in the autonomous region of Madrid (Spain). A multiple correspondence analysis technique was used, followed by a cluster analysis to create a dendrogram. A cross-sectional study was made beforehand with the results of the questionnaire. Five clusters stand out. Cluster 1: a group who preferred not to answer numerous questions (5%). Cluster 2: in favour of receiving palliative care and euthanasia (40%). Cluster 3: would oppose assisted suicide and would not ask for spiritual assistance (15%). Cluster 4: would like to receive palliative care and assisted suicide (16%). Cluster 5: would oppose assisted suicide and would ask for spiritual assistance (24%). The following four clusters stood out. Clusters 2 and 4 would like to receive palliative care, euthanasia (2) and assisted suicide (4). Clusters 4 and 5 regularly practiced their faith and their family members did not receive palliative care. Clusters 3 and 5 would be opposed to euthanasia and assisted suicide in particular. Clusters 2, 4 and 5 had not completed an advance directive document (2, 4 and 5). Clusters 2 and 3 seldom practiced their faith. This study could be taken into consideration to improve the quality of end-of-life care choices. Copyright © 2017 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  20. Sparsity enabled cluster reduced-order models for control

    Science.gov (United States)

    Kaiser, Eurika; Morzyński, Marek; Daviller, Guillaume; Kutz, J. Nathan; Brunton, Bingni W.; Brunton, Steven L.

    2018-01-01

    Characterizing and controlling nonlinear, multi-scale phenomena are central goals in science and engineering. Cluster-based reduced-order modeling (CROM) was introduced to exploit the underlying low-dimensional dynamics of complex systems. CROM builds a data-driven discretization of the Perron-Frobenius operator, resulting in a probabilistic model for ensembles of trajectories. A key advantage of CROM is that it embeds nonlinear dynamics in a linear framework, which enables the application of standard linear techniques to the nonlinear system. CROM is typically computed on high-dimensional data; however, access to and computations on this full-state data limit the online implementation of CROM for prediction and control. Here, we address this key challenge by identifying a small subset of critical measurements to learn an efficient CROM, referred to as sparsity-enabled CROM. In particular, we leverage compressive measurements to faithfully embed the cluster geometry and preserve the probabilistic dynamics. Further, we show how to identify fewer optimized sensor locations tailored to a specific problem that outperform random measurements. Both of these sparsity-enabled sensing strategies significantly reduce the burden of data acquisition and processing for low-latency in-time estimation and control. We illustrate this unsupervised learning approach on three different high-dimensional nonlinear dynamical systems from fluids with increasing complexity, with one application in flow control. Sparsity-enabled CROM is a critical facilitator for real-time implementation on high-dimensional systems where full-state information may be inaccessible.

  1. Variable selection based on clustering analysis for improvement of polyphenols prediction in green tea using synchronous fluorescence spectra

    Science.gov (United States)

    Shan, Jiajia; Wang, Xue; Zhou, Hao; Han, Shuqing; Riza, Dimas Firmanda Al; Kondo, Naoshi

    2018-04-01

    Synchronous fluorescence spectra, combined with multivariate analysis were used to predict flavonoids content in green tea rapidly and nondestructively. This paper presented a new and efficient spectral intervals selection method called clustering based partial least square (CL-PLS), which selected informative wavelengths by combining clustering concept and partial least square (PLS) methods to improve models’ performance by synchronous fluorescence spectra. The fluorescence spectra of tea samples were obtained and k-means and kohonen-self organizing map clustering algorithms were carried out to cluster full spectra into several clusters, and sub-PLS regression model was developed on each cluster. Finally, CL-PLS models consisting of gradually selected clusters were built. Correlation coefficient (R) was used to evaluate the effect on prediction performance of PLS models. In addition, variable influence on projection partial least square (VIP-PLS), selectivity ratio partial least square (SR-PLS), interval partial least square (iPLS) models and full spectra PLS model were investigated and the results were compared. The results showed that CL-PLS presented the best result for flavonoids prediction using synchronous fluorescence spectra.

  2. Self-consistent clustering analysis: an efficient multiscale scheme for inelastic heterogeneous materials

    Energy Technology Data Exchange (ETDEWEB)

    Liu, Z.; Bessa, M. A.; Liu, W.K.

    2017-10-25

    A predictive computational theory is shown for modeling complex, hierarchical materials ranging from metal alloys to polymer nanocomposites. The theory can capture complex mechanisms such as plasticity and failure that span across multiple length scales. This general multiscale material modeling theory relies on sound principles of mathematics and mechanics, and a cutting-edge reduced order modeling method named self-consistent clustering analysis (SCA) [Zeliang Liu, M.A. Bessa, Wing Kam Liu, “Self-consistent clustering analysis: An efficient multi-scale scheme for inelastic heterogeneous materials,” Comput. Methods Appl. Mech. Engrg. 306 (2016) 319–341]. SCA reduces by several orders of magnitude the computational cost of micromechanical and concurrent multiscale simulations, while retaining the microstructure information. This remarkable increase in efficiency is achieved with a data-driven clustering method. Computationally expensive operations are performed in the so-called offline stage, where degrees of freedom (DOFs) are agglomerated into clusters. The interaction tensor of these clusters is computed. In the online or predictive stage, the Lippmann-Schwinger integral equation is solved cluster-wise using a self-consistent scheme to ensure solution accuracy and avoid path dependence. To construct a concurrent multiscale model, this scheme is applied at each material point in a macroscale structure, replacing a conventional constitutive model with the average response computed from the microscale model using just the SCA online stage. A regularized damage theory is incorporated in the microscale that avoids the mesh and RVE size dependence that commonly plagues microscale damage calculations. The SCA method is illustrated with two cases: a carbon fiber reinforced polymer (CFRP) structure with the concurrent multiscale model and an application to fatigue prediction for additively manufactured metals. For the CFRP problem, a speed up estimated to be about

  3. A Variational Level Set Model Combined with FCMS for Image Clustering Segmentation

    Directory of Open Access Journals (Sweden)

    Liming Tang

    2014-01-01

    Full Text Available The fuzzy C means clustering algorithm with spatial constraint (FCMS is effective for image segmentation. However, it lacks essential smoothing constraints to the cluster boundaries and enough robustness to the noise. Samson et al. proposed a variational level set model for image clustering segmentation, which can get the smooth cluster boundaries and closed cluster regions due to the use of level set scheme. However it is very sensitive to the noise since it is actually a hard C means clustering model. In this paper, based on Samson’s work, we propose a new variational level set model combined with FCMS for image clustering segmentation. Compared with FCMS clustering, the proposed model can get smooth cluster boundaries and closed cluster regions due to the use of level set scheme. In addition, a block-based energy is incorporated into the energy functional, which enables the proposed model to be more robust to the noise than FCMS clustering and Samson’s model. Some experiments on the synthetic and real images are performed to assess the performance of the proposed model. Compared with some classical image segmentation models, the proposed model has a better performance for the images contaminated by different noise levels.

  4. Sensitization trajectories in childhood revealed by using a cluster analysis

    DEFF Research Database (Denmark)

    Schoos, Ann-Marie M.; Chawes, Bo L.; Melen, Erik

    2017-01-01

    BACKGROUND: Assessment of sensitization at a single time point during childhood provides limited clinical information. We hypothesized that sensitization develops as specific patterns with respect to age at debut, development over time, and involved allergens and that such patterns might be more...... biologically and clinically relevant. OBJECTIVE: We sought to explore latent patterns of sensitization during the first 6 years of life and investigate whether such patterns associate with the development of asthma, rhinitis, and eczema. METHODS: We investigated 398 children from the at-risk Copenhagen...... Prospective Studies on Asthma in Childhood 2000 (COPSAC2000) birth cohort with specific IgE against 13 common food and inhalant allergens at the ages of ½, 1½, 4, and 6 years. An unsupervised cluster analysis for 3-dimensional data (nonnegative sparse parallel factor analysis) was used to extract latent...

  5. Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.

    Science.gov (United States)

    Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D

    2017-06-01

    Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.

  6. Stochastic cluster algorithms for discrete Gaussian (SOS) models

    International Nuclear Information System (INIS)

    Evertz, H.G.; Hamburg Univ.; Hasenbusch, M.; Marcu, M.; Tel Aviv Univ.; Pinn, K.; Muenster Univ.; Solomon, S.

    1990-10-01

    We present new Monte Carlo cluster algorithms which eliminate critical slowing down in the simulation of solid-on-solid models. In this letter we focus on the two-dimensional discrete Gaussian model. The algorithms are based on reflecting the integer valued spin variables with respect to appropriately chosen reflection planes. The proper choice of the reflection plane turns out to be crucial in order to obtain a small dynamical exponent z. Actually, the successful versions of our algorithm are a mixture of two different procedures for choosing the reflection plane, one of them ergodic but slow, the other one non-ergodic and also slow when combined with a Metropolis algorithm. (orig.)

  7. Determining wood chip size: image analysis and clustering methods

    Directory of Open Access Journals (Sweden)

    Paolo Febbi

    2013-09-01

    Full Text Available One of the standard methods for the determination of the size distribution of wood chips is the oscillating screen method (EN 15149- 1:2010. Recent literature demonstrated how image analysis could return highly accurate measure of the dimensions defined for each individual particle, and could promote a new method depending on the geometrical shape to determine the chip size in a more accurate way. A sample of wood chips (8 litres was sieved through horizontally oscillating sieves, using five different screen hole diameters (3.15, 8, 16, 45, 63 mm; the wood chips were sorted in decreasing size classes and the mass of all fractions was used to determine the size distribution of the particles. Since the chip shape and size influence the sieving results, Wang’s theory, which concerns the geometric forms, was considered. A cluster analysis on the shape descriptors (Fourier descriptors and size descriptors (area, perimeter, Feret diameters, eccentricity was applied to observe the chips distribution. The UPGMA algorithm was applied on Euclidean distance. The obtained dendrogram shows a group separation according with the original three sieving fractions. A comparison has been made between the traditional sieve and clustering results. This preliminary result shows how the image analysis-based method has a high potential for the characterization of wood chip size distribution and could be further investigated. Moreover, this method could be implemented in an online detection machine for chips size characterization. An improvement of the results is expected by using supervised multivariate methods that utilize known class memberships. The main objective of the future activities will be to shift the analysis from a 2-dimensional method to a 3- dimensional acquisition process.

  8. Integrating PROOF Analysis in Cloud and Batch Clusters

    International Nuclear Information System (INIS)

    Rodríguez-Marrero, Ana Y; Fernández-del-Castillo, Enol; López García, Álvaro; Marco de Lucas, Jesús; Matorras Weinig, Francisco; González Caballero, Isidro; Cuesta Noriega, Alberto

    2012-01-01

    High Energy Physics (HEP) analysis are becoming more complex and demanding due to the large amount of data collected by the current experiments. The Parallel ROOT Facility (PROOF) provides researchers with an interactive tool to speed up the analysis of huge volumes of data by exploiting parallel processing on both multicore machines and computing clusters. The typical PROOF deployment scenario is a permanent set of cores configured to run the PROOF daemons. However, this approach is incapable of adapting to the dynamic nature of interactive usage. Several initiatives seek to improve the use of computing resources by integrating PROOF with a batch system, such as Proof on Demand (PoD) or PROOF Cluster. These solutions are currently in production at Universidad de Oviedo and IFCA and are positively evaluated by users. Although they are able to adapt to the computing needs of users, they must comply with the specific configuration, OS and software installed at the batch nodes. Furthermore, they share the machines with other workloads, which may cause disruptions in the interactive service for users. These limitations make PROOF a typical use-case for cloud computing. In this work we take profit from Cloud Infrastructure at IFCA in order to provide a dynamic PROOF environment where users can control the software configuration of the machines. The Proof Analysis Framework (PAF) facilitates the development of new analysis and offers a transparent access to PROOF resources. Several performance measurements are presented for the different scenarios (PoD, SGE and Cloud), showing a speed improvement closely correlated with the number of cores used.

  9. Testing the Bose-Einstein Condensate dark matter model at galactic cluster scale

    Energy Technology Data Exchange (ETDEWEB)

    Harko, Tiberiu [Department of Mathematics, University College London, Gower Street, London, WC1E 6BT (United Kingdom); Liang, Pengxiang; Liang, Shi-Dong [State Key Laboratory of Optoelectronic Material and Technology, and Guangdong Province Key Laboratory of Display Material and Technology, School of Physics and Engineering, Sun Yat-Sen University, Guangzhou 510275 (China); Mocanu, Gabriela, E-mail: t.harko@ucl.ac.uk, E-mail: lpengx@mail2.sysu.edu.cn2, E-mail: stslsd@mail.sysu.edu.cn, E-mail: gabriela.mocanu@ubbcluj.ro [Astronomical Institute, Astronomical Observatory Cluj-Napoca, Romanian Academy, 15 Cire\\csilor Street, 400487 Cluj-Napoca (Romania)

    2015-11-01

    The possibility that dark matter may be in the form of a Bose-Einstein Condensate (BEC) has been extensively explored at galactic scale. In particular, good fits for the galactic rotations curves have been obtained, and upper limits for the dark matter particle mass and scattering length have been estimated. In the present paper we extend the investigation of the properties of the BEC dark matter to the galactic cluster scale, involving dark matter dominated astrophysical systems formed of thousands of galaxies each. By considering that one of the major components of a galactic cluster, the intra-cluster hot gas, is described by King's β-model, and that both intra-cluster gas and dark matter are in hydrostatic equilibrium, bound by the same total mass profile, we derive the mass and density profiles of the BEC dark matter. In our analysis we consider several theoretical models, corresponding to isothermal hot gas and zero temperature BEC dark matter, non-isothermal gas and zero temperature dark matter, and isothermal gas and finite temperature BEC, respectively. The properties of the finite temperature BEC dark matter cluster are investigated in detail numerically. We compare our theoretical results with the observational data of 106 galactic clusters. Using a least-squares fitting, as well as the observational results for the dark matter self-interaction cross section, we obtain some upper bounds for the mass and scattering length of the dark matter particle. Our results suggest that the mass of the dark matter particle is of the order of μ eV, while the scattering length has values in the range of 10{sup −7} fm.

  10. Testing the Bose-Einstein Condensate dark matter model at galactic cluster scale

    International Nuclear Information System (INIS)

    Harko, Tiberiu; Liang, Pengxiang; Liang, Shi-Dong; Mocanu, Gabriela

    2015-01-01

    The possibility that dark matter may be in the form of a Bose-Einstein Condensate (BEC) has been extensively explored at galactic scale. In particular, good fits for the galactic rotations curves have been obtained, and upper limits for the dark matter particle mass and scattering length have been estimated. In the present paper we extend the investigation of the properties of the BEC dark matter to the galactic cluster scale, involving dark matter dominated astrophysical systems formed of thousands of galaxies each. By considering that one of the major components of a galactic cluster, the intra-cluster hot gas, is described by King's β-model, and that both intra-cluster gas and dark matter are in hydrostatic equilibrium, bound by the same total mass profile, we derive the mass and density profiles of the BEC dark matter. In our analysis we consider several theoretical models, corresponding to isothermal hot gas and zero temperature BEC dark matter, non-isothermal gas and zero temperature dark matter, and isothermal gas and finite temperature BEC, respectively. The properties of the finite temperature BEC dark matter cluster are investigated in detail numerically. We compare our theoretical results with the observational data of 106 galactic clusters. Using a least-squares fitting, as well as the observational results for the dark matter self-interaction cross section, we obtain some upper bounds for the mass and scattering length of the dark matter particle. Our results suggest that the mass of the dark matter particle is of the order of μ eV, while the scattering length has values in the range of 10 −7 fm

  11. CLUSTER ANALYSIS OF NATURAL DISASTER LOSSES IN POLISH AGRICULTURE

    Directory of Open Access Journals (Sweden)

    Grzegorz STRUPCZEWSKI

    2015-04-01

    Full Text Available Agricultural production risk is of special nature due to a great number of hazards, relative weakness of production entities on the market and high ambiguity which is greater than in industrial production. Natural disasters occurring very frequently, at simultaneous low percentage of insured farmers, cause damage of such sizes that force the state to organise current financial aid (for instance in the form of preferential natural disaster loans. This aid is usually not sufficient. On the other hand, regional diversity of the risk level does not positively affect the development of insurance. From the perspective of insurance companies and policymakers it becomes highly important to investigate the spatial structure of losses in agriculture caused by natural disasters. The purpose of the research is to classify the 16 Polish voivodeships into clusters in order to show differences between them according to the criterion of level of damage in agricultural farms caused by natural disasters. On the basis of the cluster analysis it was demonstrated that 11 voivodeships form quite a homogeneous group in terms of size of damage in agriculture (the value of damage in cultivations and the acreage of destroyed cultivations are two most important factors determining affiliation to the cluster, however, the profile of loss occurring in other five voivodeships has a very individual course and requires separate handling in the actuarial sense. It was also proved that high value of losses in agriculture in the absolute sense in given voivodeships do not have to mean high vulnerability of agricultural farms from these voivodeships to natural risks.

  12. Practice-related changes in neural activation patterns investigated via wavelet-based clustering analysis

    Science.gov (United States)

    Lee, Jinae; Park, Cheolwoo; Dyckman, Kara A.; Lazar, Nicole A.; Austin, Benjamin P.; Li, Qingyang; McDowell, Jennifer E.

    2012-01-01

    Objectives To evaluate brain activation using functional magnetic resonance imaging (fMRI) and specifically, activation changes across time associated with practice-related cognitive control during eye movement tasks. Experimental design Participants were engaged in antisaccade performance (generating a glance away from a cue) while fMR images were acquired during two separate time points: 1) at pre-test before any exposure to the task, and 2) at post-test, after one week of daily practice on antisaccades, prosaccades (glancing towards a target) or fixation (maintaining gaze on a target). Principal observations The three practice groups were compared across the two time points, and analyses were conducted via the application of a model-free clustering technique based on wavelet analysis. This series of procedures was developed to avoid analysis problems inherent in fMRI data and was composed of several steps: detrending, data aggregation, wavelet transform and thresholding, no trend test, principal component analysis and K-means clustering. The main clustering algorithm was built in the wavelet domain to account for temporal correlation. We applied a no trend test based on wavelets to significantly reduce the high dimension of the data. We clustered the thresholded wavelet coefficients of the remaining voxels using the principal component analysis K-means clustering. Conclusion Over the series of analyses, we found that the antisaccade practice group was the only group to show decreased activation from pre- to post-test in saccadic circuitry, particularly evident in supplementary eye field, frontal eye fields, superior parietal lobe, and cuneus. PMID:22505290

  13. The Analysis of a Simple k-Means Clustering Algorithm

    National Research Council Canada - National Science Library

    Kanungo, T; Mount, D. M; Netanyahu, N. S; Piatko, C; Silverman, R; Wu, A. Y

    2000-01-01

    .... A popular heuristic for k-means clustering is Lloyd's algorithm. In this paper, we present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm...

  14. Tigers on trails: occupancy modeling for cluster sampling.

    Science.gov (United States)

    Hines, J E; Nichols, J D; Royle, J A; MacKenzie, D I; Gopalaswamy, A M; Kumar, N Samba; Karanth, K U

    2010-07-01

    estimation in conservation monitoring. More generally, this work represents a contribution to the topic of cluster sampling for situations in which there is a need for specific modeling (e.g., reflecting dependence) for the distribution of the variable(s) of interest among subunits.

  15. Alpha-cluster preformation factor within cluster-formation model for odd-A and odd-odd heavy nuclei

    Science.gov (United States)

    Saleh Ahmed, Saad M.

    2017-06-01

    The alpha-cluster probability that represents the preformation of alpha particle in alpha-decay nuclei was determined for high-intensity alpha-decay mode odd-A and odd-odd heavy nuclei, 82 CSR) and the hypothesised cluster-formation model (CFM) as in our previous work. Our previous successful determination of phenomenological values of alpha-cluster preformation factors for even-even nuclei motivated us to expand the work to cover other types of nuclei. The formation energy of interior alpha cluster needed to be derived for the different nuclear systems with considering the unpaired-nucleon effect. The results showed the phenomenological value of alpha preformation probability and reflected the unpaired nucleon effect and the magic and sub-magic effects in nuclei. These results and their analyses presented are very useful for future work concerning the calculation of the alpha decay constants and the progress of its theory.

  16. Deuterium cluster model for low energy nuclear reactions (LENR)

    Science.gov (United States)

    Miley, George; Hora, Heinrich

    2007-11-01

    For studying the possible reactions of high density deuterons on the background of a degenerate electron gas, a summary of experimental observations resulted in the possibility of reactions in pm distance and more than ksec duration similar to the K-shell electron capture [1]. The essential reason was the screening of the deuterons by a factor of 14 based on the observations. Using the bosonic properties for a cluster formation of the deuterons and a model of compound nuclear reactions [2], the measured distribution of the resulting nuclei may be explained as known from the Maruhn-Greiner theory for fission. The local maximum of the distribution at the main minimum indicates the excited states of the compound nuclei during their intermediary state. This measured local maximum may be an independent proof for the deuteron clusters at LENR. [1] H. Hora, G.H. Miley et al. Physics Letters A175, 138 (1993) [2] H. Hora and G.H. Miley, APS March Meeting 2007, Program p. 116

  17. An approximate analytic model of a star cluster with potential escapers

    Science.gov (United States)

    Daniel, Kathryne J.; Heggie, Douglas C.; Varri, Anna Lisa

    2017-06-01

    In the context of a star cluster moving on a circular galactic orbit, a 'potential escaper' is a cluster star that has orbital energy greater than the escape energy, and yet is confined within the Jacobi radius of the stellar system. On the other hand, analytic models of stellar clusters typically have a truncation energy equal to the cluster escape energy, and therefore explicitly exclude these energetically unbound stars. Starting from the landmark analysis performed by Hénon of periodic orbits of the circular Hill equations, we present a numerical exploration of the population of 'non-escapers', defined here as those stars that remain within two Jacobi radii for several galactic periods, with energy above the escape energy. We show that they can be characterized by the Jacobi integral and two further approximate integrals, which are based on perturbation theory and ideas drawn from Lidov-Kozai theory. Finally, we use these results to construct an approximate analytic model that includes a phase-space description of a population resembling that of potential escapers, in addition to the usual bound population.

  18. Social Media Use and Depression and Anxiety Symptoms: A Cluster Analysis.

    Science.gov (United States)

    Shensa, Ariel; Sidani, Jaime E; Dew, Mary Amanda; Escobar-Viera, César G; Primack, Brian A

    2018-03-01

    Individuals use social media with varying quantity, emotional, and behavioral at- tachment that may have differential associations with mental health outcomes. In this study, we sought to identify distinct patterns of social media use (SMU) and to assess associations between those patterns and depression and anxiety symptoms. In October 2014, a nationally-representative sample of 1730 US adults ages 19 to 32 completed an online survey. Cluster analysis was used to identify patterns of SMU. Depression and anxiety were measured using respective 4-item Patient-Reported Outcome Measurement Information System (PROMIS) scales. Multivariable logistic regression models were used to assess associations between clus- ter membership and depression and anxiety. Cluster analysis yielded a 5-cluster solu- tion. Participants were characterized as "Wired," "Connected," "Diffuse Dabblers," "Concentrated Dabblers," and "Unplugged." Membership in 2 clusters - "Wired" and "Connected" - increased the odds of elevated depression and anxiety symptoms (AOR = 2.7, 95% CI = 1.5-4.7; AOR = 3.7, 95% CI = 2.1-6.5, respectively, and AOR = 2.0, 95% CI = 1.3-3.2; AOR = 2.0, 95% CI = 1.3-3.1, respectively). SMU pattern characterization of a large population suggests 2 pat- terns are associated with risk for depression and anxiety. Developing educational interventions that address use patterns rather than single aspects of SMU (eg, quantity) would likely be useful.

  19. Modeling jet and outflow feedback during star cluster formation

    Energy Technology Data Exchange (ETDEWEB)

    Federrath, Christoph [Monash Centre for Astrophysics, School of Mathematical Sciences, Monash University, VIC 3800 (Australia); Schrön, Martin [Department of Computational Hydrosystems, Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, D-04318 Leipzig (Germany); Banerjee, Robi [Hamburger Sternwarte, Gojenbergsweg 112, D-21029 Hamburg (Germany); Klessen, Ralf S., E-mail: christoph.federrath@monash.edu [Universität Heidelberg, Zentrum für Astronomie, Institut für Theoretische Astrophysik, Albert-Ueberle-Strasse 2, D-69120 Heidelberg (Germany)

    2014-08-01

    Powerful jets and outflows are launched from the protostellar disks around newborn stars. These outflows carry enough mass and momentum to transform the structure of their parent molecular cloud and to potentially control star formation itself. Despite their importance, we have not been able to fully quantify the impact of jets and outflows during the formation of a star cluster. The main problem lies in limited computing power. We would have to resolve the magnetic jet-launching mechanism close to the protostar and at the same time follow the evolution of a parsec-size cloud for a million years. Current computer power and codes fall orders of magnitude short of achieving this. In order to overcome this problem, we implement a subgrid-scale (SGS) model for launching jets and outflows, which demonstrably converges and reproduces the mass, linear and angular momentum transfer, and the speed of real jets, with ∼1000 times lower resolution than would be required without the SGS model. We apply the new SGS model to turbulent, magnetized star cluster formation and show that jets and outflows (1) eject about one-fourth of their parent molecular clump in high-speed jets, quickly reaching distances of more than a parsec, (2) reduce the star formation rate by about a factor of two, and (3) lead to the formation of ∼1.5 times as many stars compared to the no-outflow case. Most importantly, we find that jets and outflows reduce the average star mass by a factor of ∼ three and may thus be essential for understanding the characteristic mass of the stellar initial mass function.

  20. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    Science.gov (United States)

    Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

    2018-01-01

    Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  1. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    Directory of Open Access Journals (Sweden)

    Iliev Iliycho

    2018-01-01

    Full Text Available Subject of investigation is a new high-powered strontium bromide (SrBr2 vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  2. Fuzzy C-Means Clustering Model Data Mining For Recognizing Stock Data Sampling Pattern

    Directory of Open Access Journals (Sweden)

    Sylvia Jane Annatje Sumarauw

    2007-06-01

    Full Text Available Abstract Capital market has been beneficial to companies and investor. For investors, the capital market provides two economical advantages, namely deviden and capital gain, and a non-economical one that is a voting .} hare in Shareholders General Meeting. But, it can also penalize the share owners. In order to prevent them from the risk, the investors should predict the prospect of their companies. As a consequence of having an abstract commodity, the share quality will be determined by the validity of their company profile information. Any information of stock value fluctuation from Jakarta Stock Exchange can be a useful consideration and a good measurement for data analysis. In the context of preventing the shareholders from the risk, this research focuses on stock data sample category or stock data sample pattern by using Fuzzy c-Me, MS Clustering Model which providing any useful information jar the investors. lite research analyses stock data such as Individual Index, Volume and Amount on Property and Real Estate Emitter Group at Jakarta Stock Exchange from January 1 till December 31 of 204. 'he mining process follows Cross Industry Standard Process model for Data Mining (CRISP,. DM in the form of circle with these steps: Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation and Deployment. At this modelling process, the Fuzzy c-Means Clustering Model will be applied. Data Mining Fuzzy c-Means Clustering Model can analyze stock data in a big database with many complex variables especially for finding the data sample pattern, and then building Fuzzy Inference System for stimulating inputs to be outputs that based on Fuzzy Logic by recognising the pattern. Keywords: Data Mining, AUz..:y c-Means Clustering Model, Pattern Recognition

  3. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Science.gov (United States)

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  4. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Directory of Open Access Journals (Sweden)

    Yuanchao Liu

    Full Text Available The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  5. CHOOSING A HEALTH INSTITUTION WITH MULTIPLE CORRESPONDENCE ANALYSIS AND CLUSTER ANALYSIS IN A POPULATION BASED STUDY

    Directory of Open Access Journals (Sweden)

    ASLI SUNER

    2013-06-01

    Full Text Available Multiple correspondence analysis is a method making easy to interpret the categorical variables given in contingency tables, showing the similarities, associations as well as divergences among these variables via graphics on a lower dimensional space. Clustering methods are helped to classify the grouped data according to their similarities and to get useful summarized data from them. In this study, interpretations of multiple correspondence analysis are supported by cluster analysis; factors affecting referred health institute such as age, disease group and health insurance are examined and it is aimed to compare results of the methods.

  6. Diagrammatic analysis of correlations in polymer fluids: Cluster diagrams via Edwards' field theory

    International Nuclear Information System (INIS)

    Morse, David C.

    2006-01-01

    Edwards' functional integral approach to the statistical mechanics of polymer liquids is amenable to a diagrammatic analysis in which free energies and correlation functions are expanded as infinite sums of Feynman diagrams. This analysis is shown to lead naturally to a perturbative cluster expansion that is closely related to the Mayer cluster expansion developed for molecular liquids by Chandler and co-workers. Expansion of the functional integral representation of the grand-canonical partition function yields a perturbation theory in which all quantities of interest are expressed as functionals of a monomer-monomer pair potential, as functionals of intramolecular correlation functions of non-interacting molecules, and as functions of molecular activities. In different variants of the theory, the pair potential may be either a bare or a screened potential. A series of topological reductions yields a renormalized diagrammatic expansion in which collective correlation functions are instead expressed diagrammatically as functionals of the true single-molecule correlation functions in the interacting fluid, and as functions of molecular number density. Similar renormalized expansions are also obtained for a collective Ornstein-Zernicke direct correlation function, and for intramolecular correlation functions. A concise discussion is given of the corresponding Mayer cluster expansion, and of the relationship between the Mayer and perturbative cluster expansions for liquids of flexible molecules. The application of the perturbative cluster expansion to coarse-grained models of dense multi-component polymer liquids is discussed, and a justification is given for the use of a loop expansion. As an example, the formalism is used to derive a new expression for the wave-number dependent direct correlation function and recover known expressions for the intramolecular two-point correlation function to first-order in a renormalized loop expansion for coarse-grained models of

  7. Modeling Neurovascular Coupling from Clustered Parameter Sets for Multimodal EEG-NIRS

    Directory of Open Access Journals (Sweden)

    M. Tanveer Talukdar

    2015-01-01

    Full Text Available Despite significant improvements in neuroimaging technologies and analysis methods, the fundamental relationship between local changes in cerebral hemodynamics and the underlying neural activity remains largely unknown. In this study, a data driven approach is proposed for modeling this neurovascular coupling relationship from simultaneously acquired electroencephalographic (EEG and near-infrared spectroscopic (NIRS data. The approach uses gamma transfer functions to map EEG spectral envelopes that reflect time-varying power variations in neural rhythms to hemodynamics measured with NIRS during median nerve stimulation. The approach is evaluated first with simulated EEG-NIRS data and then by applying the method to experimental EEG-NIRS data measured from 3 human subjects. Results from the experimental data indicate that the neurovascular coupling relationship can be modeled using multiple sets of gamma transfer functions. By applying cluster analysis, statistically significant parameter sets were found to predict NIRS hemodynamics from EEG spectral envelopes. All subjects were found to have significant clustered parameters (P<0.05 for EEG-NIRS data fitted using gamma transfer functions. These results suggest that the use of gamma transfer functions followed by cluster analysis of the resulting parameter sets may provide insights into neurovascular coupling in human neuroimaging data.

  8. SERA Scenarios of Early Market Fuel Cell Electric Vehicle Introductions: Modeling Framework, Regional Markets, and Station Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Bush, B. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Melaina, M. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Penev, M. [National Renewable Energy Lab. (NREL), Golden, CO (United States); Daniel, W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2013-09-01

    This report describes the development and analysis of detailed temporal and spatial scenarios for early market hydrogen fueling infrastructure clustering and fuel cell electric vehicle rollout using the Scenario Evaluation, Regionalization and Analysis (SERA) model. The report provides an overview of the SERA scenario development framework and discusses the approach used to develop the nationwidescenario.

  9. MMPI profiles of males accused of severe crimes: a cluster analysis

    NARCIS (Netherlands)

    Spaans, M.; Barendregt, M.; Muller, E.; Beurs, E. de; Nijman, H.L.I.; Rinne, T.

    2009-01-01

    In studies attempting to classify criminal offenders by cluster analysis of Minnesota Multiphasic Personality Inventory-2 (MMPI-2) data, the number of clusters found varied between 10 (the Megargee System) and two (one cluster indicating no psychopathology and one exhibiting serious

  10. Clustering-neural network models for freeway work zone capacity estimation.

    Science.gov (United States)

    Jiang, Xiaomo; Adeli, Hojjat

    2004-06-01

    Two neural network models, called clustering-RBFNN and clustering-BPNN models, are created for estimating the work zone capacity in a freeway work zone as a function of seventeen different factors through judicious integration of the subtractive clustering approach with the radial basis function (RBF) and the backpropagation (BP) neural network models. The clustering-RBFNN model has the attractive characteristics of training stability, accuracy, and quick convergence. The results of validation indicate that the work zone capacity can be estimated by clustering-neural network models in general with an error of less than 10%, even with limited data available to train the models. The clustering-RBFNN model is used to study several main factors affecting work zone capacity. The results of such parametric studies can assist work zone engineers and highway agencies to create effective traffic management plans (TMP) for work zones quantitatively and objectively.

  11. Influence of birth cohort on age of onset cluster analysis in bipolar I disorder

    DEFF Research Database (Denmark)

    Bauer, M.; Glenn, T.; Alda, M.

    2015-01-01

    Purpose: Two common approaches to identify subgroups of patients with bipolar disorder are clustering methodology (mixture analysis) based on the age of onset, and a birth cohort analysis. This study investigates if a birth cohort effect will influence the results of clustering on the age of onset......, using a large, international database. Methods: The database includes 4037 patients with a diagnosis of bipolar I disorder, previously collected at 36 collection sites in 23 countries. Generalized estimating equations (GEE) were used to adjust the data for country median age, and in some models, birth...... for birth cohort (three subgroups), family history and polarity of the first episode could not be distinguished between the middle and oldest subgroups. Conclusion: These results using international data confirm prior findings using single country data, that there are subgroups of bipolar I disorder based...

  12. Comparative analysis on the selection of number of clusters in community detection

    Science.gov (United States)

    Kawamoto, Tatsuro; Kabashima, Yoshiyuki

    2018-02-01

    We conduct a comparative analysis on various estimates of the number of clusters in community detection. An exhaustive comparison requires testing of all possible combinations of frameworks, algorithms, and assessment criteria. In this paper we focus on the framework based on a stochastic block model, and investigate the performance of greedy algorithms, statistical inference, and spectral methods. For the assessment criteria, we consider modularity, map equation, Bethe free energy, prediction errors, and isolated eigenvalues. From the analysis, the tendency of overfit and underfit that the assessment criteria and algorithms have becomes apparent. In addition, we propose that the alluvial diagram is a suitable tool to visualize statistical inference results and can be useful to determine the number of clusters.

  13. Mathematical model for research and analyze relations and functions between enterprises, members of cluster

    Science.gov (United States)

    Angelov, Kiril; Kaynakchieva, Vesela

    2017-12-01

    The aim of the current study is to research and analyze Mathematical model for research and analyze of relations and functions between enterprises, members of cluster, and its approbation in given cluster. Subject of the study are theoretical mechanisms for the definition of mathematical models for research and analyze of relations and functions between enterprises, members of cluster. Object of the study are production enterprises, members of cluster. Results of this study show that described theoretical mathematical model is applicable for research and analyze of functions and relations between enterprises, members of cluster from different industrial sectors. This circumstance creates alternatives for election of cluster, where is experimented this model for interaction improvement between enterprises, members of cluster.

  14. Maximum-likelihood model averaging to profile clustering of site types across discrete linear sequences.

    Directory of Open Access Journals (Sweden)

    Zhang Zhang

    2009-06-01

    Full Text Available A major analytical challenge in computational biology is the detection and description of clusters of specified site types, such as polymorphic or substituted sites within DNA or protein sequences. Progress has been stymied by a lack of suitable methods to detect clusters and to estimate the extent of clustering in discrete linear sequences, particularly when there is no a priori specification of cluster size or cluster count. Here we derive and demonstrate a maximum likelihood method of hierarchical clustering. Our method incorporates a tripartite divide-and-conquer strategy that models sequence heterogeneity, delineates clusters, and yields a profile of the level of clustering associated with each site. The clustering model may be evaluated via model selection using the Akaike Information Criterion, the corrected Akaike Information Criterion, and the Bayesian Information Criterion. Furthermore, model averaging using weighted model likelihoods may be applied to incorporate model uncertainty into the profile of heterogeneity across sites. We evaluated our method by examining its performance on a number of simulated datasets as well as on empirical polymorphism data from diverse natural alleles of the Drosophila alcohol dehydrogenase gene. Our method yielded greater power for the detection of clustered sites across a breadth of parameter ranges, and achieved better accuracy and precision of estimation of clusters, than did the existing empirical cumulative distribution function statistics.

  15. Cluster analysis of midlatitude oceanic cloud regimes: mean properties and temperature sensitivity

    Directory of Open Access Journals (Sweden)

    N. D. Gordon

    2010-07-01

    Full Text Available Clouds play an important role in the climate system by reducing the amount of shortwave radiation reaching the surface and the amount of longwave radiation escaping to space. Accurate simulation of clouds in computer models remains elusive, however, pointing to a lack of understanding of the connection between large-scale dynamics and cloud properties. This study uses a k-means clustering algorithm to group 21 years of satellite cloud data over midlatitude oceans into seven clusters, and demonstrates that the cloud clusters are associated with distinct large-scale dynamical conditions. Three clusters correspond to low-level cloud regimes with different cloud fraction and cumuliform or stratiform characteristics, but all occur under large-scale descent and a relatively dry free troposphere. Three clusters correspond to vertically extensive cloud regimes with tops in the middle or upper troposphere, and they differ according to the strength of large-scale ascent and enhancement of tropospheric temperature and humidity. The final cluster is associated with a lower troposphere that is dry and an upper troposphere that is moist and experiencing weak ascent and horizontal moist advection.

    Since the present balance of reflection of shortwave and absorption of longwave radiation by clouds could change as the atmosphere warms from increasing anthropogenic greenhouse gases, we must also better understand how increasing temperature modifies cloud and radiative properties. We therefore undertake an observational analysis of how midlatitude oceanic clouds change with temperature when dynamical processes are held constant (i.e., partial derivative with respect to temperature. For each of the seven cloud regimes, we examine the difference in cloud and radiative properties between warm and cold subsets. To avoid misinterpreting a cloud response to large-scale dynamical forcing as a cloud response to temperature, we require horizontal and vertical

  16. Independent component analysis to detect clustered microcalcification breast cancers.

    Science.gov (United States)

    Gallardo-Caballero, R; García-Orellana, C J; García-Manso, A; González-Velasco, H M; Macías-Macías, M

    2012-01-01

    The presence of clustered microcalcifications is one of the earliest signs in breast cancer detection. Although there exist many studies broaching this problem, most of them are nonreproducible due to the use of proprietary image datasets. We use a known subset of the currently largest publicly available mammography database, the Digital Database for Screening Mammography (DDSM), to develop a computer-aided detection system that outperforms the current reproducible studies on the same mammogram set. This proposal is mainly based on the use of extracted image features obtained by independent component analysis, but we also study the inclusion of the patient's age as a nonimage feature which requires no human expertise. Our system achieves an average of 2.55 false positives per image at a sensitivity of 81.8% and 4.45 at a sensitivity of 91.8% in diagnosing the BCRP_CALC_1 subset of DDSM.

  17. Model catalysis by size-selected cluster deposition

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Scott [Univ. of Utah, Salt Lake City, UT (United States)

    2015-11-20

    This report summarizes the accomplishments during the last four years of the subject grant. Results are presented for experiments in which size-selected model catalysts were studied under surface science and aqueous electrochemical conditions. Strong effects of cluster size were found, and by correlating the size effects with size-dependent physical properties of the samples measured by surface science methods, it was possible to deduce mechanistic insights, such as the factors that control the rate-limiting step in the reactions. Results are presented for CO oxidation, CO binding energetics and geometries, and electronic effects under surface science conditions, and for the electrochemical oxygen reduction reaction, ethanol oxidation reaction, and for oxidation of carbon by water.

  18. Cluster analysis of the factors influencing innovative development of economy in regions of Russian Federation

    Directory of Open Access Journals (Sweden)

    V. N. Yur’ev

    2017-01-01

    Full Text Available This article provides a statistical description aimed at identifying the factors, which influence on the innovative development in regions of the Russian Federation. Presented article refers to the results of previous research [1, p. 212–218]. On the first stage, there was given a terminology on the concepts of innovations and innovative development, as well as their role in the modern economy was stated. On the next stage, the factors, which may have an influence on the volume of innovative products, activities and services, were chosen. The results received from this article show the cluster analysis of the regions conducted according to three chosen methods. In the course of the research, data was collected from an official web page of Federal State Statistics Service in accordance to previously chosen factors, its’ analysis and conclusions were made, on the current step the cluster analysis was additionally conducted. To analyze the sample rates and to divide regions to the clusters we’ve used a fully integrated line of analytic solutions Statistica [2], for analyzing, visualizing and forecasting. As a result of a statistical analysis and Statistica use regions were divided into clusters according to the three methods: hierarchical classification, Kaverage method and two-input distribution. To make more detailed analysis, linear, power and exponential equations were built for each region. As a result there were drawn two tables: 1 with the Euclidian distances; 2 with the regression models and the meaningful factors. Thereby, regions were grouped. For each group conclusions and recommendations were given. The results of current research will be applicable for analysis and planning of different commercial and governmental market participants.

  19. Entropy-rate clustering: cluster analysis via maximizing a submodular function subject to a matroid constraint.

    Science.gov (United States)

    Liu, Ming-Yu; Tuzel, Oncel; Ramalingam, Srikumar; Chellappa, Rama

    2014-01-01

    We propose a new objective function for clustering. This objective function consists of two components: the entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes and penalizes larger clusters that aggressively group samples. We present a novel graph construction for the graph associated with the data and show that this construction induces a matroid--a combinatorial structure that generalizes the concept of linear independence in vector spaces. The clustering result is given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting the submodular and monotonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of (1/2) for the optimality of the greedy solution. We validate the proposed algorithm on various benchmarks and show its competitive performances with respect to popular clustering algorithms. We further apply it for the task of superpixel segmentation. Experiments on the Berkeley segmentation data set reveal its superior performances over the state-of-the-art superpixel segmentation algorithms in all the standard evaluation metrics.

  20. The relationship between supplier networks and industrial clusters: an analysis based on the cluster mapping method

    Directory of Open Access Journals (Sweden)

    Ichiro IWASAKI

    2010-06-01

    Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.

  1. Multilook SAR Image Segmentation with an Unknown Number of Clusters Using a Gamma Mixture Model and Hierarchical Clustering.

    Science.gov (United States)

    Zhao, Quanhua; Li, Xiaoli; Li, Yu

    2017-05-12

    This paper presents a novel multilook SAR image segmentation algorithm with an unknown number of clusters. Firstly, the marginal probability distribution for a given SAR image is defined by a Gamma mixture model (GaMM), in which the number of components corresponds to the number of homogeneous regions needed to segment and the spatial relationship among neighboring pixels is characterized by a Markov Random Field (MRF) defined by the weighting coefficients of components in GaMM. During the algorithm iteration procedure, the number of clusters is gradually reduced by merging two components until they are equal to one. For each fixed number of clusters, the parameters of GaMM are estimated and the optimal segmentation result corresponding to the number is obtained by maximizing the marginal probability. Finally, the number of clusters with minimum global energy defined as the negative logarithm of marginal probability is indicated as the expected number of clusters with the homogeneous regions needed to be segmented, and the corresponding segmentation result is considered as the final optimal one. The experimental results from the proposed and comparing algorithms for simulated and real multilook SAR images show that the proposed algorithm can find the real number of clusters and obtain more accurate segmentation results simultaneously.

  2. Testing lowered isothermal models with direct N-body simulations of globular clusters - II. Multimass models

    Science.gov (United States)

    Peuten, M.; Zocchi, A.; Gieles, M.; Hénault-Brunet, V.

    2017-09-01

    Lowered isothermal models, such as the multimass Michie-King models, have been successful in describing observational data of globular clusters. In this study, we assess whether such models are able to describe the phase space properties of evolutionary N-body models. We compare the multimass models as implemented in limepy (Gieles & Zocchi) to N-body models of star clusters with different retention fractions for the black holes and neutron stars evolving in a tidal field. We find that multimass models successfully reproduce the density and velocity dispersion profiles of the different mass components in all evolutionary phases and for different remnants retention. We further use these results to study the evolution of global model parameters. We find that over the lifetime of clusters, radial anisotropy gradually evolves from the low- to the high-mass components and we identify features in the properties of observable stars that are indicative of the presence of stellar-mass black holes. We find that the model velocity scale depends on mass as m-δ, with δ ≃ 0.5 for almost all models, but the dependence of central velocity dispersion on m can be shallower, depending on the dark remnant content, and agrees well with that of the N-body models. The reported model parameters, and correlations amongst them, can be used as theoretical priors when fitting these types of mass models to observational data.

  3. Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

    Science.gov (United States)

    Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

    2017-10-25

    Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.

  4. Performance analysis of clustering techniques over microarray data: A case study

    Science.gov (United States)

    Dash, Rasmita; Misra, Bijan Bihari

    2018-03-01

    Handling big data is one of the major issues in the field of statistical data analysis. In such investigation cluster analysis plays a vital role to deal with the large scale data. There are many clustering techniques with different cluster analysis approach. But which approach suits a particular dataset is difficult to predict. To deal with this problem a grading approach is introduced over many clustering techniques to identify a stable technique. But the grading approach depends on the characteristic of dataset as well as on the validity indices. So a two stage grading approach is implemented. In this study the grading approach is implemented over five clustering techniques like hybrid swarm based clustering (HSC), k-means, partitioning around medoids (PAM), vector quantization (VQ) and agglomerative nesting (AGNES). The experimentation is conducted over five microarray datasets with seven validity indices. The finding of grading approach that a cluster technique is significant is also established by Nemenyi post-hoc hypothetical test.

  5. MODELING THE VERY SMALL SCALE CLUSTERING OF LUMINOUS RED GALAXIES

    International Nuclear Information System (INIS)

    Watson, Douglas F.; Berlind, Andreas A.; McBride, Cameron K.; Masjedi, Morad

    2010-01-01

    We model the small-scale clustering of luminous red galaxies (LRGs) in the Sloan Digital Sky Survey. Specifically, we use the halo occupation distribution formalism to model the projected two-point correlation function of LRGs on scales well within the sizes of their host halos (0.016 h -1 Mpc ≤ r ≤ 0.42 h -1 Mpc). We start by varying P(N|M), the probability distribution that a dark matter halo of mass M contains N LRGs, and assuming that the radial distribution of satellite LRGs within halos traces the Navarro-Frenk-White (NFW) dark matter density profile. We find that varying P(N|M) alone is not sufficient to match the small-scale data. We next allow the concentration of satellite LRG galaxies to differ from that of dark matter and find that this is also not sufficient. Finally, we relax the assumption of an NFW profile and allow the inner slope of the density profile to vary. We find that this model provides a good fit to the data and the resulting value of the slope is -2.17 ± 0.12. The radial density profile of satellite LRGs within halos is thus not compatible with that of the underlying dark matter, but rather is closer to an isothermal distribution.

  6. Interpolation of daily rainfall using spatiotemporal models and clustering

    KAUST Repository

    Militino, A. F.

    2014-06-11

    Accumulated daily rainfall in non-observed locations on a particular day is frequently required as input to decision-making tools in precision agriculture or for hydrological or meteorological studies. Various solutions and estimation procedures have been proposed in the literature depending on the auxiliary information and the availability of data, but most such solutions are oriented to interpolating spatial data without incorporating temporal dependence. When data are available in space and time, spatiotemporal models usually provide better solutions. Here, we analyse the performance of three spatiotemporal models fitted to the whole sampled set and to clusters within the sampled set. The data consists of daily observations collected from 87 manual rainfall gauges from 1990 to 2010 in Navarre, Spain. The accuracy and precision of the interpolated data are compared with real data from 33 automated rainfall gauges in the same region, but placed in different locations than the manual rainfall gauges. Root mean squared error by months and by year are also provided. To illustrate these models, we also map interpolated daily precipitations and standard errors on a 1km2 grid in the whole region. © 2014 Royal Meteorological Society.

  7. Does objective cluster analysis serve as a useful precursor to seasonal precipitation prediction at local scale? Application to western Ethiopia

    Science.gov (United States)

    Zhang, Ying; Moges, Semu; Block, Paul

    2018-01-01

    Prediction of seasonal precipitation can provide actionable information to guide management of various sectoral activities. For instance, it is often translated into hydrological forecasts for better water resources management. However, many studies assume homogeneity in precipitation across an entire study region, which may prove ineffective for operational and local-level decisions, particularly for locations with high spatial variability. This study proposes advancing local-level seasonal precipitation predictions by first conditioning on regional-level predictions, as defined through objective cluster analysis, for western Ethiopia. To our knowledge, this is the first study predicting seasonal precipitation at high resolution in this region, where lives and livelihoods are vulnerable to precipitation variability given the high reliance on rain-fed agriculture and limited water resources infrastructure. The combination of objective cluster analysis, spatially high-resolution prediction of seasonal precipitation, and a modeling structure spanning statistical and dynamical approaches makes clear advances in prediction skill and resolution, as compared with previous studies. The statistical model improves versus the non-clustered case or dynamical models for a number of specific clusters in northwestern Ethiopia, with clusters having regional average correlation and ranked probability skill score (RPSS) values of up to 0.5 and 33 %, respectively. The general skill (after bias correction) of the two best-performing dynamical models over the entire study region is superior to that of the statistical models, although the dynamical models issue predictions at a lower resolution and the raw predictions require bias correction to guarantee comparable skills.

  8. Clustering Dycom

    KAUST Repository

    Minku, Leandro L.

    2017-10-06

    Background: Software Effort Estimation (SEE) can be formulated as an online learning problem, where new projects are completed over time and may become available for training. In this scenario, a Cross-Company (CC) SEE approach called Dycom can drastically reduce the number of Within-Company (WC) projects needed for training, saving the high cost of collecting such training projects. However, Dycom relies on splitting CC projects into different subsets in order to create its CC models. Such splitting can have a significant impact on Dycom\\'s predictive performance. Aims: This paper investigates whether clustering methods can be used to help finding good CC splits for Dycom. Method: Dycom is extended to use clustering methods for creating the CC subsets. Three different clustering methods are investigated, namely Hierarchical Clustering, K-Means, and Expectation-Maximisation. Clustering Dycom is compared against the original Dycom with CC subsets of different sizes, based on four SEE databases. A baseline WC model is also included in the analysis. Results: Clustering Dycom with K-Means can potentially help to split the CC projects, managing to achieve similar or better predictive performance than Dycom. However, K-Means still requires the number of CC subsets to be pre-defined, and a poor choice can negatively affect predictive performance. EM enables Dycom to automatically set the number of CC subsets while still maintaining or improving predictive performance with respect to the baseline WC model. Clustering Dycom with Hierarchical Clustering did not offer significant advantage in terms of predictive performance. Conclusion: Clustering methods can be an effective way to automatically generate Dycom\\'s CC subsets.

  9. GraphCrunch 2: Software tool for network modeling, alignment and clustering

    Directory of Open Access Journals (Sweden)

    Hayes Wayne

    2011-01-01

    Full Text Available Abstract Background Recent advancements in experimental biotechnology have produced large amounts of protein-protein interaction (PPI data. The topology of PPI networks is believed to have a strong link to their function. Hence, the abundance of PPI data for many organisms stimulates the development of computational techniques for the modeling, comparison, alignment, and clustering of networks. In addition, finding representative models for PPI networks will improve our understanding of the cell just as a model of gravity has helped us understand planetary motion. To decide if a model is representative, we need quantitative comparisons of model networks to real ones. However, exact network comparison is computationally intractable and therefore several heuristics have been used instead. Some of these heuristics are easily computable "network properties," such as the degree distribution, or the clustering coefficient. An important special case of network comparison is the network alignment problem. Analogous to sequence alignment, this problem asks to find the "best" mapping between regions in two networks. It is expected that network alignment might have as strong an impact on our understanding of biology as sequence alignment has had. Topology-based clustering of nodes in PPI networks is another example of an important network analysis problem that can uncover relationships between interaction patterns and phenotype. Results We introduce the GraphCrunch 2 software tool, which addresses these problems. It is a significant extension of GraphCrunch which implements the most popular random network models and compares them with the data networks with respect to many network properties. Also, GraphCrunch 2 implements the GRAph ALigner algorithm ("GRAAL" for purely topological network alignment. GRAAL can align any pair of networks and exposes large, dense, contiguous regions of topological and functional similarities far larger than any other

  10. Using cluster analysis to identify patterns in students’ responses to contextually different conceptual problems

    Directory of Open Access Journals (Sweden)

    John Stewart

    2012-10-01

    Full Text Available This study examined the evolution of student responses to seven contextually different versions of two Force Concept Inventory questions in an introductory physics course at the University of Arkansas. The consistency in answering the closely related questions evolved little over the seven-question exam. A model for the state of student knowledge involving the probability of selecting one of the multiple-choice answers was developed. Criteria for using clustering algorithms to extract model parameters were explored and it was found that the overlap between the probability distributions of the model vectors was an important parameter in characterizing the cluster models. The course data were then clustered and the extracted model showed that students largely fit into two groups both pre- and postinstruction: one that answered all questions correctly with high probability and one that selected the distracter representing the same misconception with high probability. For the course studied, 14% of the students were left with persistent misconceptions post instruction on a static force problem and 30% on a dynamic Newton’s third law problem. These students selected the answer representing the predominant misconception slightly more consistently postinstruction, indicating that the course studied had been ineffective at moving this subgroup of students nearer a Newtonian force concept and had instead moved them slightly farther away from a correct conceptual understanding of these two problems. The consistency in answering pairs of problems with varied physical contexts is shown to be an important supplementary statistic to the score on the problems and suggests that the inclusion of such problem pairs in future conceptual inventories would be efficacious. Multiple, contextually varied questions further probe the structure of students’ knowledge. To allow working instructors to make use of the additional insight gained from cluster analysis, it

  11. Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Qunyi Xie

    2016-01-01

    Full Text Available Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF and Gaussian mixture model- (GMM- based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

  12. Clustering reveals limits of parameter identifiability in multi-parameter models of biochemical dynamics.

    Science.gov (United States)

    Nienałtowski, Karol; Włodarczyk, Michał; Lipniacki, Tomasz; Komorowski, Michał

    2015-09-29

    Compared to engineering or physics problems, dynamical models in quantitative biology typically depend on a relatively large number of parameters. Progress in developing mathematics to manipulate such multi-parameter models and so enable their efficient interplay with experiments has been slow. Existing solutions are significantly limited by model size. In order to simplify analysis of multi-parameter models a method for clustering of model parameters is proposed. It is based on a derived statistically meaningful measure of similarity between groups of parameters. The measure quantifies to what extend changes in values of some parameters can be compensated by changes in values of other parameters. The proposed methodology provides a natural mathematical language to precisely communicate and visualise effects resulting from compensatory changes in values of parameters. As a results, a relevant insight into identifiability analysis and experimental planning can be obtained. Analysis of NF-κB and MAPK pathway models shows that highly compensative parameters constitute clusters consistent with the network topology. The method applied to examine an exceptionally rich set of published experiments on the NF-κB dynamics reveals that the experiments jointly ensure identifiability of only 60% of model parameters. The method indicates which further experiments should be performed in order to increase the number of identifiable parameters. We currently lack methods that simplify broadly understood analysis of multi-parameter models. The introduced tools depict mutually compensative effects between parameters to provide insight regarding role of individual parameters, identifiability and experimental design. The method can also find applications in related methodological areas of model simplification and parameters estimation.

  13. A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Yingya Zhang

    2016-05-01

    Full Text Available Traffic congestion clustering judgment is a fundamental problem in the study of traffic jam warning. However, it is not satisfactory to judge traffic congestion degrees using only vehicle speed. In this paper, we collect traffic flow information with three properties (traffic flow velocity, traffic flow density and traffic volume of urban trunk roads, which is used to judge the traffic congestion degree. We first define a grey relational clustering model by leveraging grey relational analysis and rough set theory to mine relationships of multidimensional-attribute information. Then, we propose a grey relational membership degree rank clustering algorithm (GMRC to discriminant clustering priority and further analyze the urban traffic congestion degree. Our experimental results show that the average accuracy of the GMRC algorithm is 24.9% greater than that of the K-means algorithm and 30.8% greater than that of the Fuzzy C-Means (FCM algorithm. Furthermore, we find that our method can be more conducive to dynamic traffic warnings.

  14. A Collaboration Service Model for a Global Port Cluster

    OpenAIRE

    Toh, Keith K.T.; Welsh, Karyn; Hassall, Kim

    2010-01-01

    The importance of port clusters to a global city may be viewed from a number of perspectives. The development of port clusters and economies of agglomeration and their contribution to a regional economy is underpinned by information and physical infrastructure that facilitates collaboration between business entities within the cluster. The maturity of technologies providing portals, web and middleware services provides an opportunity to push the boundaries of contemporary service reference mo...

  15. Characterizing Heterogeneity within Head and Neck Lesions Using Cluster Analysis of Multi-Parametric MRI Data.

    Directory of Open Access Journals (Sweden)

    Marco Borri

    Full Text Available To describe a methodology, based on cluster analysis, to partition multi-parametric functional imaging data into groups (or clusters of similar functional characteristics, with the aim of characterizing functional heterogeneity within head and neck tumour volumes. To evaluate the performance of the proposed approach on a set of longitudinal MRI data, analysing the evolution of the obtained sub-sets with treatment.The cluster analysis workflow was applied to a combination of dynamic contrast-enhanced and diffusion-weighted imaging MRI data from a cohort of squamous cell carcinoma of the head and neck patients. Cumulative distributions of voxels, containing pre and post-treatment data and including both primary tumours and lymph nodes, were partitioned into k clusters (k = 2, 3 or 4. Principal component analysis and cluster validation were employed to investigate data composition and to independently determine the optimal number of clusters. The evolution of the resulting sub-regions with induction chemotherapy treatment was assessed relative to the number of clusters.The clustering algorithm was able to separate clusters which significantly reduced in voxel number following induction chemotherapy from clusters with a non-significant reduction. Partitioning with the optimal number of clusters (k = 4, determined with cluster validation, produced the best separation between reducing and non-reducing clusters.The proposed methodology was able to identify tumour sub-regions with distinct functional properties, independently separating clusters which were affected differently by treatment. This work demonstrates that unsupervised cluster analysis, with no prior knowledge of the data, can be employed to provide a multi-parametric characterization of functional heterogeneity within tumour volumes.

  16. Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System.

    Directory of Open Access Journals (Sweden)

    Vessela Krasteva

    Full Text Available This study presents a 2-stage heartbeat classifier of supraventricular (SVB and ventricular (VB beats. Stage 1 makes computationally-efficient classification of SVB-beats, using simple correlation threshold criterion for finding close match with a predominant normal (reference beat template. The non-matched beats are next subjected to measurement of 20 basic features, tracking the beat and reference template morphology and RR-variability for subsequent refined classification in SVB or VB-class by Stage 2. Four linear classifiers are compared: cluster, fuzzy, linear discriminant analysis (LDA and classification tree (CT, all subjected to iterative training for selection of the optimal feature space among extended 210-sized set, embodying interactive second-order effects between 20 independent features. The optimization process minimizes at equal weight the false positives in SVB-class and false negatives in VB-class. The training with European ST-T, AHA, MIT-BIH Supraventricular Arrhythmia databases found the best performance settings of all classification models: Cluster (30 features, Fuzzy (72 features, LDA (142 coefficients, CT (221 decision nodes with top-3 best scored features: normalized current RR-interval, higher/lower frequency content ratio, beat-to-template correlation. Unbiased test-validation with MIT-BIH Arrhythmia database rates the classifiers in descending order of their specificity for SVB-class: CT (99.9%, LDA (99.6%, Cluster (99.5%, Fuzzy (99.4%; sensitivity for ventricular ectopic beats as part from VB-class (commonly reported in published beat-classification studies: CT (96.7%, Fuzzy (94.4%, LDA (94.2%, Cluster (92.4%; positive predictivity: CT (99.2%, Cluster (93.6%, LDA (93.0%, Fuzzy (92.4%. CT has superior accuracy by 0.3-6.8% points, with the advantage for easy model complexity configuration by pruning the tree consisted of easy interpretable 'if-then' rules.

  17. Superiority of Classification Tree versus Cluster, Fuzzy and Discriminant Models in a Heartbeat Classification System.

    Science.gov (United States)

    Krasteva, Vessela; Jekova, Irena; Leber, Remo; Schmid, Ramun; Abächerli, Roger

    2015-01-01

    This study presents a 2-stage heartbeat classifier of supraventricular (SVB) and ventricular (VB) beats. Stage 1 makes computationally-efficient classification of SVB-beats, using simple correlation threshold criterion for finding close match with a predominant normal (reference) beat template. The non-matched beats are next subjected to measurement of 20 basic features, tracking the beat and reference template morphology and RR-variability for subsequent refined classification in SVB or VB-class by Stage 2. Four linear classifiers are compared: cluster, fuzzy, linear discriminant analysis (LDA) and classification tree (CT), all subjected to iterative training for selection of the optimal feature space among extended 210-sized set, embodying interactive second-order effects between 20 independent features. The optimization process minimizes at equal weight the false positives in SVB-class and false negatives in VB-class. The training with European ST-T, AHA, MIT-BIH Supraventricular Arrhythmia databases found the best performance settings of all classification models: Cluster (30 features), Fuzzy (72 features), LDA (142 coefficients), CT (221 decision nodes) with top-3 best scored features: normalized current RR-interval, higher/lower frequency content ratio, beat-to-template correlation. Unbiased test-validation with MIT-BIH Arrhythmia database rates the classifiers in descending order of their specificity for SVB-class: CT (99.9%), LDA (99.6%), Cluster (99.5%), Fuzzy (99.4%); sensitivity for ventricular ectopic beats as part from VB-class (commonly reported in published beat-classification studies): CT (96.7%), Fuzzy (94.4%), LDA (94.2%), Cluster (92.4%); positive predictivity: CT (99.2%), Cluster (93.6%), LDA (93.0%), Fuzzy (92.4%). CT has superior accuracy by 0.3-6.8% points, with the advantage for easy model complexity configuration by pruning the tree consisted of easy interpretable 'if-then' rules.

  18. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel'dovich Survey

    Energy Technology Data Exchange (ETDEWEB)

    Schrabback, T.; et al.

    2016-11-11

    We present an HST/ACS weak gravitational lensing analysis of 13 massive high-redshift (z_median=0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V-I colour. Our estimate of the source redshift distribution is based on CANDELS data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the mass-concentration relation using simulations. In combination with temperature estimates from Chandra we constrain the normalisation of the mass-temperature scaling relation ln(E(z) M_500c/10^14 M_sun)=A+1.5 ln(kT/7.2keV) to A=1.81^{+0.24}_{-0.14}(stat.) +/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c_200c=5.6^{+3.7}_{-1.8}.

  19. Cluster Mass Calibration at High Redshift: HST Weak Lensing Analysis of 13 Distant Galaxy Clusters from the South Pole Telescope Sunyaev-Zel’dovich Survey

    Energy Technology Data Exchange (ETDEWEB)

    Schrabback, T.; Applegate, D.; Dietrich, J. P.; Hoekstra, H.; Bocquet, S.; Gonzalez, A. H.; der Linden, A. von; McDonald, M.; Morrison, C. B.; Raihan, S. F.; Allen, S. W.; Bayliss, M.; Benson, B. A.; Bleem, L. E.; Chiu, I.; Desai, S.; Foley, R. J.; de Haan, T.; High, F. W.; Hilbert, S.; Mantz, A. B.; Massey, R.; Mohr, J.; Reichardt, C. L.; Saro, A.; Simon, P.; Stern, C.; Stubbs, C. W.; Zenteno, A.

    2017-10-14

    We present an HST/Advanced Camera for Surveys (ACS) weak gravitational lensing analysis of 13 massive high-redshift (z(median) = 0.88) galaxy clusters discovered in the South Pole Telescope (SPT) Sunyaev-Zel'dovich Survey. This study is part of a larger campaign that aims to robustly calibrate mass-observable scaling relations over a wide range in redshift to enable improved cosmological constraints from the SPT cluster sample. We introduce new strategies to ensure that systematics in the lensing analysis do not degrade constraints on cluster scaling relations significantly. First, we efficiently remove cluster members from the source sample by selecting very blue galaxies in V - I colour. Our estimate of the source redshift distribution is based on Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey (CANDELS) data, where we carefully mimic the source selection criteria of the cluster fields. We apply a statistical correction for systematic photometric redshift errors as derived from Hubble Ultra Deep Field data and verified through spatial cross-correlations. We account for the impact of lensing magnification on the source redshift distribution, finding that this is particularly relevant for shallower surveys. Finally, we account for biases in the mass modelling caused by miscentring and uncertainties in the concentration-mass relation using simulations. In combination with temperature estimates from Chandra we constrain the normalization of the mass-temperature scaling relation ln (E(z) M-500c/10(14)M(circle dot)) = A + 1.5ln (kT/7.2 keV) to A = 1.81(-0.14)(+0.24)(stat.)+/- 0.09(sys.), consistent with self-similar redshift evolution when compared to lower redshift samples. Additionally, the lensing data constrain the average concentration of the clusters to c(200c) = 5.6(-1.8)(+3.7).

  20. Spatial and Temporal Clustering in a Simple Earthquake Asperity Model

    Science.gov (United States)

    Tiampo, K. F.; Kazemian, J.; Dominguez, R.; Klein, W.

    2016-12-01

    Natural earthquake fault systems are highly heterogeneous in space, the result of inhomogeneities that are a function of the variety of materials of different strengths. However, despite their inhomogeneous nature, real faults are often modeled as spatially homogeneous systems. Here we present a simple earthquake fault model based on the Olami-Feder-Christensen (OFC) and Rundle-Jackson-Brown (RJB) cellular automata models with long-range interactions that incorporates asperities, or stronger sites, into the lattice (Rundle and Jackson, 1977; Olami et al., 1992). These asperity cells are significantly stronger than the surrounding lattice sites but eventually rupture when the applied stress reaches their higher threshold stress. The introduction of these spatial heterogeneities results in spatial and temporal clustering in the model similar to that seen in natural fault systems. We observe sequences of activity that begin with a gradually accelerating number of larger events, or foreshocks, prior to a large event, followed by a tail of decreasing activity, or aftershocks. These recurrent large events occur at regular intervals and the characteristic time between events and their magnitude are a function of the stress dissipation parameter. The relative length of the foreshock to aftershock sequence depends on the amount of stress dissipation in the system. This work provides further evidence that the spatial and temporal patterns observed in natural seismicity are strongly influenced by the underlying physical properties and are not solely the result of a simple cascade mechanism. We find that the scaling depends not only on the amount of damage, but also on the spatial distribution of that damage (Dominguez et al., 2011; Kazemian et al., 2014). Here we compare the modeled sequences to those of natural earthquake sequences from California and around the world in order to investigate the interplay between cascade dynamics and spatial structure.

  1. Comprehensive Transportation Logistics Network Level Layout Based on Principal Component Factor and Cluster Analysis

    Directory of Open Access Journals (Sweden)

    Zhang Jingrong

    2017-01-01

    Full Text Available Comprehensive Transportation Logistics Network (CTLN acts as a crucial prop and fundamental carrier for regional economic and social development. Firstly, an index system for evaluating the development of regional Comprehensive Transportation Logistics (CTL nodes is established; then regional CTLN nodes are ranked according to their importance by the method of Principal Component Analysis(PCA, and main factors affecting the development of regional CTL nodes are analyzed by applying factor analysis, and regional CTL nodes are classified according to their feature similarities by applying cluster analysis; and then level structure of constructing regional CTLN is proposed. Finally, combined with geographic locations of different nodes, level layout model of CTLN of the whole region is obtained. Taking Henan province this region as an instance, level layout model of hub-and-spoke CTLN taking Zhengzhou at its core is proposed after analysis, providing a reference basis for constructing CTLN in whole province scientifically and reasonably.

  2. Spectral clustering for water body spectral types analysis

    Science.gov (United States)

    Huang, Leping; Li, Shijin; Wang, Lingli; Chen, Deqing

    2017-11-01

    In order to study the spectral types of water body in the whole country, the key issue of reservoir research is to obtain and to analyze the information of water body in the reservoir quantitatively and accurately. A new type of weight matrix is constructed by utilizing the spectral features and spatial features of the spectra from GF-1 remote sensing images comprehensively. Then an improved spectral clustering algorithm is proposed based on this weight matrix to cluster representative reservoirs in China. According to the internal clustering validity index which called Davies-Bouldin(DB) index, the best clustering number 7 is obtained. Compared with two clustering algorithms, the spectral clustering algorithm based only on spectral features and the K-means algorithm based on spectral features and spatial features, simulation results demonstrate that the proposed spectral clustering algorithm based on spectral features and spatial features has a higher clustering accuracy, which can better reflect the spatial clustering characteristics of representative reservoirs in various provinces in China - similar spectral properties and adjacent geographical locations.

  3. Fuzzy and hard clustering analysis for thyroid disease.

    Science.gov (United States)

    Azar, Ahmad Taher; El-Said, Shaimaa Ahmed; Hassanien, Aboul Ella

    2013-07-01

    Thyroid hormones produced by the thyroid gland help regulation of the body's metabolism. A variety of methods have been proposed in the literature for thyroid disease classification. As far as we know, clustering techniques have not been used in thyroid diseases data set so far. This paper proposes a comparison between hard and fuzzy clustering algorithms for thyroid diseases data set in order to find the optimal number of clusters. Different scalar validity measures are used in comparing the performances of the proposed clustering systems. To demonstrate the performance of each algorithm, the feature values that represent thyroid disease are used as input for the system. Several runs are carried out and recorded with a different number of clusters being specified for each run (between 2 and 11), so as to establish the optimum number of clusters. To find the optimal number of clusters, the so-called elbow criterion is applied. The experimental results revealed that for all algorithms, the elbow was located at c=3. The clustering results for all algorithms are then visualized by the Sammon mapping method to find a low-dimensional (normally 2D or 3D) representation of a set of points distributed in a high dimensional pattern space. At the end of this study, some recommendations are formulated to improve determining the actual number of clusters present in the data set. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  4. Using Multilevel Factor Analysis with Clustered Data: Investigating the Factor Structure of the Positive Values Scale

    Science.gov (United States)

    Huang, Francis L.; Cornell, Dewey G.

    2016-01-01

    Advances in multilevel modeling techniques now make it possible to investigate the psychometric properties of instruments using clustered data. Factor models that overlook the clustering effect can lead to underestimated standard errors, incorrect parameter estimates, and model fit indices. In addition, factor structures may differ depending on…

  5. Sensitization trajectories in childhood revealed by using a cluster analysis.

    Science.gov (United States)

    Schoos, Ann-Marie M; Chawes, Bo L; Melén, Erik; Bergström, Anna; Kull, Inger; Wickman, Magnus; Bønnelykke, Klaus; Bisgaard, Hans; Rasmussen, Morten A

    2017-12-01

    Assessment of sensitization at a single time point during childhood provides limited clinical information. We hypothesized that sensitization develops as specific patterns with respect to age at debut, development over time, and involved allergens and that such patterns might be more biologically and clinically relevant. We sought to explore latent patterns of sensitization during the first 6 years of life and investigate whether such patterns associate with the development of asthma, rhinitis, and eczema. We investigated 398 children from the at-risk Copenhagen Prospective Studies on Asthma in Childhood 2000 (COPSAC 2000 ) birth cohort with specific IgE against 13 common food and inhalant allergens at the ages of ½, 1½, 4, and 6 years. An unsupervised cluster analysis for 3-dimensional data (nonnegative sparse parallel factor analysis) was used to extract latent patterns explicitly characterizing temporal development of sensitization while clustering allergens and children. Subsequently, these patterns were investigated in relation to asthma, rhinitis, and eczema. Verification was sought in an independent unselected birth cohort (BAMSE) constituting 3051 children with specific IgE against the same allergens at 4 and 8 years of age. The nonnegative sparse parallel factor analysis indicated a complex latent structure involving 7 age- and allergen-specific patterns in the COPSAC 2000 birth cohort data: (1) dog/cat/horse, (2) timothy grass/birch, (3) molds, (4) house dust mites, (5) peanut/wheat flour/mugwort, (6) peanut/soybean, and (7) egg/milk/wheat flour. Asthma was solely associated with pattern 1 (odds ratio [OR], 3.3; 95% CI, 1.5-7.2), rhinitis with patterns 1 to 4 and 6 (OR, 2.2-4.3), and eczema with patterns 1 to 3 and 5 to 7 (OR, 1.6-2.5). All 7 patterns were verified in the independent BAMSE cohort (R 2  > 0.89). This study suggests the presence of specific sensitization patterns in early childhood differentially associated with development of

  6. Classification of Nitrate Polluting Activities through Clustering of Isotope Mixing Model Outputs.

    Science.gov (United States)

    Xue, Dongmei; De Baets, Bernard; Van Cleemput, Oswald; Hennessy, Carmel; Berglund, Michael; Boeckx, Pascal

    2013-09-01

    Apportionment of nitrate (NO) sources in surface water and classification of monitoring locations according to NO polluting activities may help implementation of water quality control measures. In this study, we (i) evaluated a Bayesian isotopic mixing model (stable isotope analysis in R [SIAR]) for NO source apportionment using 2 yr of δN-NO and δO-NO data from 29 locations within river basins in Flanders (Belgium) and five expert-defined NO polluting activities, (ii) used the NO source contributions as input to an unsupervised learning algorithm (k-means clustering) to reclassify sampling locations into NO polluting activities, and (iii) assessed if a decision tree model of physicochemical data could retrieve the isotope-based and expert-defined classifications. Based on the SIAR and δB results, manure/sewage was identified as a major NO source, whereas soil N, fertilizer NO, and NH in fertilizer and rain were intermediate sources and NO in precipitation was a minor source. The k-means clustering algorithm allowed classification of NO polluting activities that corresponded well to the expert-defined classifications. A decision tree model of physicochemical parameters allowed us to correctly classify 50 to 100% of the sampling locations as compared with the k-means clustering approach. We suggest that NO polluting activities can be identified via clustering of NO source contributions from samples representing an entire river basin. Classification of future monitoring locations into these classes could use decision tree models based on physicochemical data. The latter approach holds a substantial degree of uncertainty but provides more inherent information for dedicated abatement strategies than monitoring of NO concentrations alone. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  7. Adapting Spectral Co-clustering to Documents and Terms Using Latent Semantic Analysis

    Science.gov (United States)

    Park, Laurence A. F.; Leckie, Christopher A.; Ramamohanarao, Kotagiri; Bezdek, James C.

    Spectral co-clustering is a generic method of computing co-clusters of relational data, such as sets of documents and their terms. Latent semantic analysis is a method of document and term smoothing that can assist in the information retrieval process. In this article we examine the process behind spectral clustering for documents and terms, and compare it to Latent Semantic Analysis. We show that both spectral co-clustering and LSA follow the same process, using different normalisation schemes and metrics. By combining the properties of the two co-clustering methods, we obtain an improved co-clustering method for document-term relational data that provides an increase in the cluster quality of 33.0%.

  8. Identification and validation of asthma phenotypes in Chinese population using cluster analysis.

    Science.gov (United States)

    Wang, Lei; Liang, Rui; Zhou, Ting; Zheng, Jing; Liang, Bing Miao; Zhang, Hong Ping; Luo, Feng Ming; Gibson, Peter G; Wang, Gang

    2017-10-01

    Asthma is a heterogeneous airway disease, so it is crucial to clearly identify clinical phenotypes to achieve better asthma management. To identify and prospectively validate asthma clusters in a Chinese population. Two hundred eighty-four patients were consecutively recruited and 18 sociodemographic and clinical variables were collected. Hierarchical cluster analysis was performed by the Ward method followed by k-means cluster analysis. Then, a prospective 12-month cohort study was used to validate the identified clusters. Five clusters were successfully identified. Clusters 1 (n = 71) and 3 (n = 81) were mild asthma phenotypes with slight airway obstruction and low exacerbation risk, but with a sex differential. Cluster 2 (n = 65) described an "allergic" phenotype, cluster 4 (n = 33) featured a "fixed airflow limitation" phenotype with smoking, and cluster 5 (n = 34) was a "low socioeconomic status" phenotype. Patients in clusters 2, 4, and 5 had distinctly lower socioeconomic status and more psychological symptoms. Cluster 2 had a significantly increased risk of exacerbations (risk ratio [RR] 1.13, 95% confidence interval [CI] 1.03-1.25), unplanned visits for asthma (RR 1.98, 95% CI 1.07-3.66), and emergency visits for asthma (RR 7.17, 95% CI 1.26-40.80). Cluster 4 had an increased risk of unplanned visits (RR 2.22, 95% CI 1.02-4.81), and cluster 5 had increased emergency visits (RR 12.72, 95% CI 1.95-69.78). Kaplan-Meier analysis confirmed that cluster grouping was predictive of time to the first asthma exacerbation, unplanned visit, emergency visit, and hospital admission (P clusters as "allergic asthma," "fixed airflow limitation," and "low socioeconomic status" phenotypes that are at high risk of severe asthma exacerbations and that have management implications for clinical practice in developing countries. Copyright © 2017 American College of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  9. An effective fuzzy kernel clustering analysis approach for gene expression data.

    Science.gov (United States)

    Sun, Lin; Xu, Jiucheng; Yin, Jiaojiao

    2015-01-01

    Fuzzy clustering is an important tool for analyzing microarray data. A major problem in applying fuzzy clustering method to microarray gene expression data is the choice of parameters with cluster number and centers. This paper proposes a new approach to fuzzy kernel clustering analysis (FKCA) that identifies desired cluster number and obtains more steady results for gene expression data. First of all, to optimize characteristic differences and estimate optimal cluster number, Gaussian kernel function is introduced to improve spectrum analysis method (SAM). By combining subtractive clustering with max-min distance mean, maximum distance method (MDM) is proposed to determine cluster centers. Then, the corresponding steps of improved SAM (ISAM) and MDM are given respectively, whose superiority and stability are illustrated through performing experimental comparisons on gene expression data. Finally, by introducing ISAM and MDM into FKCA, an effective improved FKCA algorithm is proposed. Experimental results from public gene expression data and UCI database show that the proposed algorithms are feasible for cluster analysis, and the clustering accuracy is higher than the other related clustering algorithms.

  10. ANALYSIS OF DEVELOPING BATIK INDUSTRY CLUSTER IN BAKARAN VILLAGE CENTRAL JAVA PROVINCE

    Directory of Open Access Journals (Sweden)

    Hermanto Hermanto

    2017-06-01

    Full Text Available SMEs grow in a cluster in a certain geographical area. The entrepreneurs grow and thrive through the business cluster. Central Java Province has a lot of business clusters in improving the regional economy, one of which is batik industry cluster. Pati Regency is one of regencies / city in Central Java that has the lowest turnover. Batik industy cluster in Pati develops quite well, which can be seen from the increasing number of batik industry incorporated in the cluster. This research examines the strategy of developing the batik industry cluster in Pati Regency. The purpose of this research is to determine the proper strategy for developing the batik industry clusters in Pati. The method of research is quantitative. The analysis tool of this research is the Strengths, Weakness, Opportunity, Threats (SWOT analysis. The result of SWOT analysis in this research shows that the proper strategy for developing the batik industry cluster in Pati is optimizing the management of batik business cluster in Bakaran Village; the local government provides information of the facility of business capital loans; the utilization of labors from Bakaran Village while improving the quality of labors by training, and marketing the Bakaran batik to the broader markets while maintaining the quality of batik. Advice that can be given from this research is that the parties who have a role in batik industry cluster development in Bakaran Village, Pati Regency, such as the Local Government.

  11. A Study on Logistics Cluster Competitiveness among Asia Main Countries using the Porter's Diamond Model

    Directory of Open Access Journals (Sweden)

    Tae Won Chung

    2016-12-01

    Full Text Available Measurement and discussions of logistics cluster competitiveness with a national approach are required to boost agglomeration effects and potentially create logistics efficiency and productivity. This study developed assessment criteria of logistics cluster competitiveness based on Porter's diamond model, calculated the weight of each criterion by the AHP method, and finally evaluated and discussed logistics cluster competitiveness among Asia main countries. The results indicate that there was a large difference in logistics cluster competitiveness among six countries. The logistics cluster competitiveness scores of Singapore (7.93, Japan (7.38, and Hong Kong (7.04 are observably different from those of China (5.40, Korea (5.08, and Malaysia (3.46. Singapore, with the highest competitiveness score, revealed its absolute advantage in logistics cluster indices. These research results intend to provide logistics policy makers with some strategic recommendations, and may serve as a baseline for further logistics cluster studies using Porter's diamond model.

  12. Self-organization of orientation maps in a formal neuron model using a cluster learning rule.

    Science.gov (United States)

    Kuroiwa, J; Inawashiro, S; Miyake, S; Aso, H

    2000-01-01

    Self-organization of orientation maps due to external stimuli in the primary visual area of the cerebral cortex is studied in a two-layered neural network which consists of formal neuron models with a sigmoidal output function. A cluster learning rule is proposed as an extended Hebbian learning rule, where a modification of synaptic connections is influenced by an activation of neighboring output neurons. By making use of self-consistent Monte Carlo method, we evaluate output responses of neurons against explicit inputs after the learning. An orientation map calculated from the output responses reproduces characteristic features of biological ones. Moreover quantitative analysis of our results are consistent with those of experimental results. It is shown that the cluster learning rule plays an important role in forming smooth changes of preferred orientations.

  13. Automated detection of microcalcification clusters in digital mammograms based on wavelet domain hidden Markov tree modeling

    International Nuclear Information System (INIS)

    Regentova, E.; Zhang, L.; Veni, G.; Zheng, J.

    2007-01-01

    A system is designed for detecting microcalcification clusters (MCC) in digital mammograms. The system is intended for computer-aided diagnostic prompting. Further discrimination of MCC as benign or malignant is assumed to be performed by radiologists. Processing of mammograms is based on the statistical modeling by means of wavelet domain hidden markov trees (WHMT). Segmentation is performed by the weighted likelihood evaluation followed by the classification based on spatial filters for a single microcalcification (MC) and a cluster of MC detection. The analysis is carried out on FROC curves for 40 mammograms from the mini-MIAS database and for 100 mammograms with 50 cancerous and 50 benign cases from DDSM database. The designed system is capable to detect 100% of true positive cases in these sets. The rate of false positives is 2.9 per case for mini-MIAS dataset; and 0.01 for the DDSM images. (orig.)

  14. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Cluster Analysis of Physical and Cognitive Ageing Patterns in Older People from Shanghai

    Directory of Open Access Journals (Sweden)

    Stephan Bandelow

    2016-02-01

    Full Text Available This study investigated the relationship between education, cognitive and physical function in older age, and their respective impacts on activities of daily living (ADL. Data on 148 older participants from a community-based sample recruited in Shanghai, China, included the following measures: age, education, ADL, grip strength, balance, gait speed, global cognition and verbal memory. The majority of participants in the present cohort were cognitively and physically healthy and reported no problems with ADL. Twenty-eight percent of participants needed help with ADL, with the majority of this group being over 80 years of age. Significant predictors of reductions in functional independence included age, balance, global cognitive function (MMSE and the gait measures. Cluster analysis revealed a protective effect of education on cognitive function that did not appear to extend to physical function. Consistency of such phenotypes of ageing clusters in other cohort studies may provide helpful models for dementia and frailty prevention measures.

  16. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  17. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  18. An evaluation of centrality measures used in cluster analysis

    Science.gov (United States)

    Engström, Christopher; Silvestrov, Sergei

    2014-12-01

    Clustering of data into groups of similar objects plays an important part when analysing many types of data, especially when the datasets are large as they often are in for example bioinformatics, social networks and computational linguistics. Many clustering algorithms such as K-means and some types of hierarchical clustering need a number of centroids representing the 'center' of the clusters. The choice of centroids for the initial clusters often plays an important role in the quality of the clusters. Since a data point with a high centrality supposedly lies close to the 'center' of some cluster, this can be used to assign centroids rather than through some other method such as picking them at random. Some work have been done to evaluate the use of centrality measures such as degree, betweenness and eigenvector centrality in clustering algorithms. The aim of this article is to compare and evaluate the usefulness of a number of common centrality measures such as the above mentioned and others such as PageRank and related measures.

  19. Cluster analysis of HZE particle tracks as applied to space radiobiology problems

    International Nuclear Information System (INIS)

    Batmunkh, M.; Bayarchimeg, L.; Lkhagva, O.; Belov, O.

    2013-01-01

    A cluster analysis is performed of ionizations in tracks produced by the most abundant nuclei in the charge and energy spectra of the galactic cosmic rays. The frequency distribution of clusters is estimated for cluster sizes comparable to the DNA molecule at different packaging levels. For this purpose, an improved K-mean-based algorithm is suggested. This technique allows processing particle tracks containing a large number of ionization events without setting the number of clusters as an input parameter. Using this method, the ionization distribution pattern is analyzed depending on the cluster size and particle's linear energy transfer

  20. Application of cluster analysis and unsupervised learning to multivariate tissue characterization

    International Nuclear Information System (INIS)

    Momenan, R.; Insana, M.F.; Wagner, R.F.; Garra, B.S.; Loew, M.H.

    1987-01-01

    This paper describes a procedure for classifying tissue types from unlabeled acoustic measurements (data type unknown) using unsupervised cluster analysis. These techniques are being applied to unsupervised ultrasonic image segmentation and tissue characterization. The performance of a new clustering technique is measured and compared with supervised methods, such as a linear Bayes classifier. In these comparisons two objectives are sought: a) How well does the clustering method group the data?; b) Do the clusters correspond to known tissue classes? The first question is investigated by a measure of cluster similarity and dispersion. The second question involves a comparison with a supervised technique using labeled data

  1. *K-means and cluster models for cancer signatures.

    Science.gov (United States)

    Kakushadze, Zura; Yu, Willie

    2017-09-01

    We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.

  2. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    Science.gov (United States)

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. The ATLAS Analysis Model

    CERN Multimedia

    Amir Farbin

    The ATLAS Analysis Model is a continually developing vision of how to reconcile physics analysis requirements with the ATLAS offline software and computing model constraints. In the past year this vision has influenced the evolution of the ATLAS Event Data Model, the Athena software framework, and physics analysis tools. These developments, along with the October Analysis Model Workshop and the planning for CSC analyses have led to a rapid refinement of the ATLAS Analysis Model in the past few months. This article introduces some of the relevant issues and presents the current vision of the future ATLAS Analysis Model. Event Data Model The ATLAS Event Data Model (EDM) consists of several levels of details, each targeted for a specific set of tasks. For example the Event Summary Data (ESD) stores calorimeter cells and tracking system hits thereby permitting many calibration and alignment tasks, but will be only accessible at particular computing sites with potentially large latency. In contrast, the Analysis...

  4. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers.

    Science.gov (United States)

    Andersen, Erlend K F; Kristensen, Gunnar B; Lyng, Heidi; Malinen, Eirik

    2011-08-01

    Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K(trans) and ύ(e), were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control.

  5. Pharmacokinetic analysis and k-means clustering of DCEMR images for radiotherapy outcome prediction of advanced cervical cancers

    International Nuclear Information System (INIS)

    Andersen, Erlend K. F.; Kristensen, Gunnar B.; Lyng, Heidi; Malinen, Eirik

    2011-01-01

    Introduction. Pharmacokinetic analysis of dynamic contrast enhanced magnetic resonance images (DCEMRI) allows for quantitative characterization of vascular properties of tumors. The aim of this study is twofold, first to determine if tumor regions with similar vascularization could be labeled by clustering methods, second to determine if the identified regions can be associated with local cancer relapse. Materials and methods. Eighty-one patients with locally advanced cervical cancer treated with chemoradiotherapy underwent DCEMRI with Gd-DTPA prior to external beam radiotherapy. The median follow-up time after treatment was four years, in which nine patients had primary tumor relapse. By fitting a pharmacokinetic two-compartment model function to the temporal contrast enhancement in the tumor, two pharmacokinetic parameters, K trans and u e , were estimated voxel by voxel from the DCEMR-images. Intratumoral regions with similar vascularization were identified by k-means clustering of the two pharmacokinetic parameter estimates over all patients. The volume fraction of each cluster was used to evaluate the prognostic value of the clusters. Results. Three clusters provided a sufficient reduction of the cluster variance to label different vascular properties within the tumors. The corresponding median volume fraction of each cluster was 38%, 46% and 10%. The second cluster was significantly associated with primary tumor control in a log-rank survival test (p-value: 0.042), showing a decreased risk of treatment failure for patients with high volume fraction of voxels. Conclusions. Intratumoral regions showing similar vascular properties could successfully be labeled in three distinct clusters and the volume fraction of one cluster region was associated with primary tumor control

  6. Spatio-temporal clustering analysis and its determinants of hand, foot and mouth disease in Hunan, China, 2009-2015.

    Science.gov (United States)

    Wu, Xinrui; Hu, Shixiong; Kwaku, Abuaku Benjamin; Li, Qi; Luo, Kaiwei; Zhou, Ying; Tan, Hongzhuan

    2017-09-25

    Hand, foot and mouth disease (HFMD) is one of the highest reported infectious diseases with several outbreaks across the world. This study aimed at describing epidemiological characteristics, investigating spatio-temporal clustering changes, and identifying determinant factors in different clustering areas of HFMD. Descriptive statistics was used to evaluate the epidemic characteristics of HFMD from 2009 to 2015. Spatial autocorrelation and spatio-temporal cluster analysis were used to explore the spatial temporal patterns. An autologistic regression model was employed to explore determinants of HFMD clustering. The incidence rates of HFMD ranged from 54.31/10 million to 318.06/10 million between 2009 and 2015 in Hunan. Cases were mainly prevalent in children aged 5 years and even younger, with an average male-to-female sex ratio of 1.66, and two epidemic periods in each year. Clustering areas gathered in the northern regions in 2009 and in the central regions from 2010 to 2012. They moved to central-southern regions in 2013 and 2014 and central-western regions in 2015. The significant risk factors of HFMD clusters were rainfall (OR = 2.187), temperature (OR = 4.329) and humidity (OR = 2.070). The protect factor was wind speed (OR = 0.258). The HFMD incidence from 2009 to 2015 in Hunan showed a new spatiotemporal clustering tendency, with the shifting trend of clustering areas toward south and west. Meteorological factors showed a strong association with HFMD clustering, which may assist in predicting future spatial-temporal clusters.

  7. An alternative methodological approach to value analysis of regions, municipal corporations and clusters

    Directory of Open Access Journals (Sweden)

    Mojmír Sabolovič

    2011-01-01

    Full Text Available The paper deals with theoretical conception of value analysis of regions, municipal corporations and clusters. The subject of this paper is heterodox approach to sensitivity analysis of finite set of variables based on non-additive measure. For dynamic analysis of trajectory of general value are sufficient robust models based on maximum entropy principle. Findings concern explanation of proper fuzzy integral – Choquet integral. The fuzzy measure is represented by theory of capacities (Choquet, 1953 on powerset. In fine, the conception of the New integral for capacities (Lehler, 2005 is discussed. Value analysis and transmission constitutes remarkable aspect of performance evaluation of regions, municipal corporations and clusters. In the light of high ratio of soft variables, social behavior, intangible assets and human capital within those types of subjects the fuzzy integral introduce useful tool for modeling. The New integral afterwards concerns considerable characteristic of people behavior – risk averse articulated concave function and non-additive operator. Results comprehended tools enabling observation of synergy, redundancy and inhibition of value variables as consequence of non-additive measure. In fine, results induced issues for future research.

  8. Multidimensional cluster stability analysis from a Brazilian Bradyrhizobium sp. RFLP/PCR data set

    Science.gov (United States)

    Milagre, S. T.; Maciel, C. D.; Shinoda, A. A.; Hungria, M.; Almeida, J. R. B.

    2009-05-01

    The taxonomy of the N2-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradyrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster.

  9. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  10. Performance Analysis of Cluster Formation in Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Edgar Romo Montiel

    2017-12-01

    Full Text Available Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes.

  11. Performance Analysis of Cluster Formation in Wireless Sensor Networks.

    Science.gov (United States)

    Montiel, Edgar Romo; Rivero-Angeles, Mario E; Rubino, Gerardo; Molina-Lozano, Heron; Menchaca-Mendez, Rolando; Menchaca-Mendez, Ricardo

    2017-12-13

    Clustered-based wireless sensor networks have been extensively used in the literature in order to achieve considerable energy consumption reductions. However, two aspects of such systems have been largely overlooked. Namely, the transmission probability used during the cluster formation phase and the way in which cluster heads are selected. Both of these issues have an important impact on the performance of the system. For the former, it is common to consider that sensor nodes in a clustered-based Wireless Sensor Network (WSN) use a fixed transmission probability to send control data in order to build the clusters. However, due to the highly variable conditions experienced by these networks, a fixed transmission probability may lead to extra energy consumption. In view of this, three different transmission probability strategies are studied: optimal, fixed and adaptive. In this context, we also investigate cluster head selection schemes, specifically, we consider two intelligent schemes based on the fuzzy C-means and k-medoids algorithms and a random selection with no intelligence. We show that the use of intelligent schemes greatly improves the performance of the system, but their use entails higher complexity and selection delay. The main performance metrics considered in this work are energy consumption, successful transmission probability and cluster formation latency. As an additional feature of this work, we study the effect of errors in the wireless channel and the impact on the performance of the system under the different transmission probability schemes.

  12. Tracking Undergraduate Student Achievement in a First-Year Physiology Course Using a Cluster Analysis Approach

    Science.gov (United States)

    Brown, S. J.; White, S.; Power, N.

    2015-01-01

    A cluster analysis data classification technique was used on assessment scores from 157 undergraduate nursing students who passed 2 successive compulsory courses in human anatomy and physiology. Student scores in five summative assessment tasks, taken in each of the courses, were used as inputs for a cluster analysis procedure. We aimed to group…

  13. Analysis of the defect clusters in congruent lithium tantalate

    Science.gov (United States)

    Vyalikh, Anastasia; Zschornak, Matthias; Köhler, Thomas; Nentwich, Melanie; Weigel, Tina; Hanzig, Juliane; Zaripov, Ruslan; Vavilova, Evgenia; Gemming, Sibylle; Brendler, Erica; Meyer, Dirk C.

    2018-01-01

    A wide range of technological applications of lithium tantalate (LT) is closely related to the defect chemistry. In literature, several intrinsic defect models have been proposed. Here, using a combinational approach based on DFT and solid-state NMR, we demonstrate that distribution of electric field gradients (EFGs) can be employed as a fingerprint of a specific defect configuration. Analyzing the distribution of 7Li EFGs, the FT-IR and electron spin resonance (ESR) spectra, and the 7Li spin-lattice relaxation behavior, we have found that the congruent LT samples provided by two manufacturers show rather different defect concentrations and distributions although both were grown by the Czochralski method. After thermal treatment hydrogen out-diffusion and homogeneous distribution of other defects have been observed by ESR, NMR, and FT-IR. The defect structure in one of two congruent LT crystals after annealing has been identified and proved by defect formation energy considerations, whereas the more complex defect configuration, including the presence of extrinsic defects, has been suggested for the other LT sample. The approach of searching the EFG fingerprints from DFT calculations in NMR spectra can be applied for identifying the defect clusters in other complex oxides.

  14. Cluster Computing For Real Time Seismic Array Analysis.

    Science.gov (United States)

    Martini, M.; Giudicepietro, F.

    A seismic array is an instrument composed by a dense distribution of seismic sen- sors that allow to measure the directional properties of the wavefield (slowness or wavenumber vector) radiated by a seismic source. Over the last years arrays have been widely used in different fields of seismological researches. In particular they are applied in the investigation of seismic sources on volcanoes where they can be suc- cessfully used for studying the volcanic microtremor and long period events which are critical for getting information on the volcanic systems evolution. For this reason arrays could be usefully employed for the volcanoes monitoring, however the huge amount of data produced by this type of instruments and the processing techniques which are quite time consuming limited their potentiality for this application. In order to favor a direct application of arrays techniques to continuous volcano monitoring we designed and built a small PC cluster able to near real time computing the kinematics properties of the wavefield (slowness or wavenumber vector) produced by local seis- mic source. The cluster is composed of 8 Intel Pentium-III bi-processors PC working at 550 MHz, and has 4 Gigabytes of RAM memory. It runs under Linux operating system. The developed analysis software package is based on the Multiple SIgnal Classification (MUSIC) algorithm and is written in Fortran. The message-passing part is based upon the LAM programming environment package, an open-source imple- mentation of the Message Passing Interface (MPI). The developed software system includes modules devote to receiving date by internet and graphical applications for the continuous displaying of the processing results. The system has been tested with a data set collected during a seismic experiment conducted on Etna in 1999 when two dense seismic arrays have been deployed on the northeast and the southeast flanks of this volcano. A real time continuous acquisition system has been simulated by

  15. Efficacy of GPS cluster analysis for predicting carnivory sites of a wide-ranging omnivore: the American black bear

    Science.gov (United States)

    Kindschuh, Sarah R.; Cain, James W.; Daniel, David; Peyton, Mark A.

    2016-01-01

    The capacity to describe and quantify predation by large carnivores expanded considerably with the advent of GPS technology. Analyzing clusters of GPS locations formed by carnivores facilitates the detection of predation events by identifying characteristics which distinguish predation sites. We present a performance assessment of GPS cluster analysis as applied to the predation and scavenging of an omnivore, the American black bear (Ursus americanus), on ungulate prey and carrion. Through field investigations of 6854 GPS locations from 24 individual bears, we identified 54 sites where black bears formed a cluster of locations while predating or scavenging elk (Cervus elaphus), mule deer (Odocoileus hemionus), or cattle (Bos spp.). We developed models for three data sets to predict whether a GPS cluster was formed at a carnivory site vs. a non-carnivory site (e.g., bed sites or non-ungulate foraging sites). Two full-season data sets contained GPS locations logged at either 3-h or 30-min intervals from April to November, and a third data set contained 30-min interval data from April through July corresponding to the calving period for elk. Longer fix intervals resulted in the detection of fewer carnivory sites. Clusters were more likely to be carnivory sites if they occurred in open or edge habitats, if they occurred in the early season, if the mean distance between all pairs of GPS locations within the cluster was less, and if the cluster endured for a longer period of time. Clusters were less likely to be carnivory sites if they were initiated in the morning or night compared to the day. The top models for each data set performed well and successfully predicted 71–96% of field-verified carnivory events, 55–75% of non–carnivory events, and 58–76% of clusters overall. Refinement of this method will benefit from further application across species and ecological systems.

  16. Clustering analysis of malware behavior using Self Organizing Map

    DEFF Research Database (Denmark)

    Pirscoveanu, Radu-Stefan; Stevanovic, Matija; Pedersen, Jens Myrup

    2016-01-01

    For the time being, malware behavioral classification is performed by means of Anti-Virus (AV) generated labels. The paper investigates the inconsistencies associated with current practices by evaluating the identified differences between current vendors. In this paper we rely on Self Organizing...... Map, an unsupervised machine learning algorithm, for generating clusters that capture the similarities between malware behavior. A data set of approximately 270,000 samples was used to generate the behavioral profile of malicious types in order to compare the outcome of the proposed clustering...... accurate results based on the clusters created by competitive and cooperative algorithms like Self Organizing Map that better describe the behavioral profile of malware....

  17. Internal validation of risk models in clustered data: a comparison of bootstrap schemes

    NARCIS (Netherlands)

    Bouwmeester, W.; Moons, K.G.M.; Kappen, T.H.; van Klei, W.A.; Twisk, J.W.R.; Eijkemans, M.J.C.; Vergouwe, Y.

    2013-01-01

    Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation

  18. Analysis of brood sex ratios: implications of offspring clustering

    Czech Academy of Sciences Publication Activity Database

    Krackow, S.; Tkadlec, Emil

    Roc. 50, č. 4 (2001), s. 293-301 ISSN 0340-5443 R&D Projects: GA ČR GA524/01/1316 Institutional research plan: CEZ:AV0Z6093917 Keywords : generalized linear mixed models * random coefficients * multilevel analysis Subject RIV: EG - Zoology Impact factor: 2.353, year: 2001

  19. Field of Study Choice: Using Conjoint Analysis and Clustering

    Science.gov (United States)

    Shtudiner, Ze'ev; Zwilling, Moti; Kantor, Jeffrey

    2017-01-01

    Purpose: The purpose of this paper is to measure student's preferences regarding various attributes that affect their decision process while choosing a higher education area of study. Design/ Methodology/Approach: The paper exhibits two different models which shed light on the perceived value of each examined area of study: conjoint analysis and…

  20. Testing dark energy and dark matter cosmological models with clusters of galaxies

    Energy Technology Data Exchange (ETDEWEB)

    Boehringer, Hans [Max-Planck-Institut fuer Extraterrestrische Physik, Garching (Germany)

    2008-07-01

    Galaxy clusters are, as the largest building blocks of our Universe, ideal probes to study the large-scale structure and to test cosmological models. The principle approach und the status of this research is reviewed. Clusters lend themselves for tests in serveral ways: the cluster mass function, the spatial clustering, the evolution of both functions with reshift, and the internal composition can be used to constrain cosmological parameters. X-ray observations are currently the best means of obtaining the relevant data on the galaxy cluster population. We illustrate in particular all the above mentioned methods with our ROSAT based cluster surveys. The mass calibration of clusters is an important issue, that is currently solved with XMM-Newton and Chandra studies. Based on the current experience we provide an outlook for future research, especially with eROSITA.

  1. A Coupled Hidden Markov Random Field Model for Simultaneous Face Clustering and Tracking in Videos

    KAUST Repository

    Wu, Baoyuan

    2016-10-25

    Face clustering and face tracking are two areas of active research in automatic facial video processing. They, however, have long been studied separately, despite the inherent link between them. In this paper, we propose to perform simultaneous face clustering and face tracking from real world videos. The motivation for the proposed research is that face clustering and face tracking can provide useful information and constraints to each other, thus can bootstrap and improve the performances of each other. To this end, we introduce a Coupled Hidden Markov Random Field (CHMRF) to simultaneously model face clustering, face tracking, and their interactions. We provide an effective algorithm based on constrained clustering and optimal tracking for the joint optimization of cluster labels and face tracking. We demonstrate significant improvements over state-of-the-art results in face clustering and tracking on several videos.

  2. Application and research of fuzzy clustering analysis algorithm under “micro-lecture” English teaching mode

    Directory of Open Access Journals (Sweden)

    Shi Ying

    2016-01-01

    Full Text Available The fuzzy clustering algorithm is to classify the data or indicators with a greater degree of similarity based on the principle of the same type of individuals possessing a greater similarity, and different types of individuals possessing differences, establish clear category boundaries, form any shape of relationship clusters in the solving process, and input the research indicators at random, in order to accurately analyze the significance of the indicators in the algorithm. The evaluation value of the clustering analysis can be obtained by the establishment of the fuzzy factor set based on the membership analysis, and the evaluation result can be analyzed through reference to the evaluation indicators of the fuzzy clustering analysis. The “micro-lecture” English teaching mode can be estimated and the analysis indicators can be rationally established based on the fuzzy clustering analysis algorithm, with better algorithm applicability.

  3. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  4. Visual cluster analysis and pattern recognition template and methods

    Science.gov (United States)

    Osbourn, Gordon Cecil; Martinez, Rubel Francisco

    1999-01-01

    A method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  5. Improving hierarchical clustering of genotypic data via principal component analysis

    NARCIS (Netherlands)

    Odong, T.L.; Heerwaarden, van J.; Hintum, van T.J.L.; Eeuwijk, van F.A.; Jansen, J.

    2013-01-01

    Understanding the genetic structure of germplasm collections is a prerequisite for effective and efficient use of crop genetic resources in genebanks. Currently, hierarchical clustering techniques are most popular for describing genetic structure in germplasm collections. Traditionally performed

  6. Visual cluster analysis and pattern recognition template and methods

    Energy Technology Data Exchange (ETDEWEB)

    Osbourn, G.C.; Martinez, R.F.

    1993-12-31

    This invention is comprised of a method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.

  7. Cluster decay analysis and related structure effects of fissionable ...

    Indian Academy of Sciences (India)

    2015-08-01

    Aug 1, 2015 ... Keywords. Collective clusterization; deformations and orientations; fission; heavy and superheavy nuclei. ... Author Affiliations. Manoj K Sharma1 Gurvinder Kaur1. School of Physics and Materials Science, Thapar University, Patiala 147 004, India ...

  8. *K-means and Cluster Models for Cancer Signatures

    OpenAIRE

    Kakushadze, Zura; Yu, Willie

    2017-01-01

    We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means’ computational cost is a fraction of NMF’s. Using 1389 published samples for 14 cancer types, we find that 3 cancer...

  9. Analysis of Radiation Damage in Light Water Reactors: Comparison of Cluster Analysis Methods for the Analysis of Atom Probe Data.

    Science.gov (United States)

    Hyde, Jonathan M; DaCosta, Gérald; Hatzoglou, Constantinos; Weekes, Hannah; Radiguet, Bertrand; Styman, Paul D; Vurpillot, Francois; Pareige, Cristelle; Etienne, Auriane; Bonny, Giovanni; Castin, Nicolas; Malerba, Lorenzo; Pareige, Philippe

    2017-04-01

    Irradiation of reactor pressure vessel (RPV) steels causes the formation of nanoscale microstructural features (termed radiation damage), which affect the mechanical properties of the vessel. A key tool for characterizing these nanoscale features is atom probe tomography (APT), due to its high spatial resolution and the ability to identify different chemical species in three dimensions. Microstructural observations using APT can underpin development of a mechanistic understanding of defect formation. However, with atom probe analyses there are currently multiple methods for analyzing the data. This can result in inconsistencies between results obtained from different researchers and unnecessary scatter when combining data from multiple sources. This makes interpretation of results more complex and calibration of radiation damage models challenging. In this work simulations of a range of different microstructures are used to directly compare different cluster analysis algorithms and identify their strengths and weaknesses.

  10. Analysis of Health Behavior Theories for Clustering of Health Behaviors.

    Science.gov (United States)

    Choi, Seung Hee; Duffy, Sonia A

    The objective of this article was to review the utility of established behavior theories, including the Health Belief Model, Theory of Reasoned Action, Theory of Planned Behavior, Transtheoretical Model, and Health Promotion Model, for addressing multiple health behaviors among people who smoke. It is critical to design future interventions for multiple health behavior changes tailored to individuals who currently smoke, yet it has not been addressed. Five health behavior theories/models were analyzed and critically evaluated. A review of the literature included a search of PubMed and Google Scholar from 2010 to 2016. Two hundred sixty-seven articles (252 studies from the initial search and 15 studies from the references of initially identified studies) were included in the analysis. Most of the health behavior theories/models emphasize psychological and cognitive constructs that can be applied only to one specific behavior at a time, thus making them not suitable to address multiple health behaviors. However, the Health Promotion Model incorporates "related behavior factors" that can explain multiple health behaviors among persons who smoke. Future multiple behavior interventions guided by the Health Promotion Model are necessary to show the utility and applicability of the model to address multiple health behaviors.

  11. The distant galaxy cluster CL0016+16: X-ray analysis up to R{200}

    Science.gov (United States)

    Solovyeva, L.; Anokhin, S.; Sauvageot, J. L.; Teyssier, R.; Neumann, D.

    2007-12-01

    Aims:CL0016+16 seems to be a good candidate for studying the mass distribution of galaxy clusters up to their Virial radius, since it is a bright massive cluster, previously considered as dynamically relaxed. Methods: Using XMM-Newton observations of CL0016+16, we performed a careful X-ray background analysis and detected its X-ray emission convincingly up to R200. We then studied its dynamical state with a detailed 2D temperature and surface brightness analysis of the inner part of the cluster. We used the assumption of both spherical symmetry and hydrostatic equilibrium (HE), to determine the main cluster parameters: total mass, temperature profile, surface brightness profile, and β-parameter. We also built a temperature map that clearly exhibits departure from spherical symmetry in the centre. To estimate the influence of these perturbations on our total mass estimate, we also computed the total mass in the framework of the HE approach, but this time with various temperature profiles obtained in different directions. Results: These various total-mass estimates are consistent with each other. The temperature perturbations are clear signatures of ongoing merger activity. We also find significant residuals after subtracting the emissivity map by a 2D β-model fit. We conclude that, although CL0016+16 shows clear signs of merger activity and departure from spherical symmetry in the centre, its X-ray emissivity can be detected up to R200 and the corresponding mass M200 can be computed directly. It is therefore a good candidate for studying cosmological scaling laws as predicted by the theory.

  12. Biomedical time series clustering based on non-negative sparse coding and probabilistic topic model.

    Science.gov (United States)

    Wang, Jin; Liu, Ping; F H She, Mary; Nahavandi, Saeid; Kouzani, Abbas

    2013-09-01

    Biomedical time series clustering that groups a set of unlabelled temporal signals according to their underlying similarity is very useful for biomedical records management and analysis such as biosignals archiving and diagnosis. In this paper, a new framework for clustering of long-term biomedical time series such as electrocardiography (ECG) and electroencephalography (EEG) signals is proposed. Specifically, local segments extracted from the time series are projected as a combination of a small number of basis elements in a trained dictionary by non-negative sparse coding. A Bag-of-Words (BoW) representation is then constructed by summing up all the sparse coefficients of local segments in a time series. Based on the BoW representation, a probabilistic topic model that was originally developed for text document analysis is extended to discover the underlying similarity of a collection of time series. The underlying similarity of biomedical time series is well captured attributing to the statistic nature of the probabilistic topic model. Experiments on three datasets constructed from publicly available EEG and ECG signals demonstrates that the proposed approach achieves better accuracy than existing state-of-the-art methods, and is insensitive to model parameters such as length of local segments and dictionary size. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  13. A Kondo cluster-glass model for spin glass Cerium alloys

    International Nuclear Information System (INIS)

    Zimmer, F M; Magalhaes, S G; Coqblin, B

    2011-01-01

    There are clear indications that the presence of disorder in Ce alloys, such as Ce(Ni,Cu) or Ce(Pd,Rh), is responsible for the existence of a cluster spin glass state which changes continuously into inhomogeneous ferromagnetism at low temperatures. We present a study of the competition between magnetism and Kondo effect in a cluster-glass model composed by a random inter-cluster interaction term and an intra-cluster one, which contains an intra-site Kondo interaction J k and an inter-site ferromagnetic one J 0 . The random interaction is given by the van Hemmen type of randomness which allows to solve the problem without the use of the replica method. The inter-cluster term is solved within the cluster mean-field theory and the remaining intra-cluster interactions can be treated by exact diagonalization. Results show the behavior of the cluster glass order parameter and the Kondo correlation function for several sizes of the clusters, J k , J 0 and values of the ferromagnetic inter-cluster average interaction I 0 . Particularly, for small J k , the magnetic solution is strongly dependent on I 0 and J 0 and a Kondo cluster-glass or a mixed phase can be obtained, while, for large J k , the Kondo effect is still dominant, both in good agreement with experiment in Ce(Ni,Cu) or Ce(Pd,Rh) alloys.

  14. The quantitative analysis of silicon carbide surface smoothing by Ar and Xe cluster ions

    Science.gov (United States)

    Ieshkin, A. E.; Kireev, D. S.; Ermakov, Yu. A.; Trifonov, A. S.; Presnov, D. E.; Garshev, A. V.; Anufriev, Yu. V.; Prokhorova, I. G.; Krupenin, V. A.; Chernysh, V. S.

    2018-04-01

    The gas cluster ion beam technique was used for the silicon carbide crystal surface smoothing. The effect of processing by two inert cluster ions, argon and xenon, was quantitatively compared. While argon is a standard element for GCIB, results for xenon clusters were not reported yet. Scanning probe microscopy and high resolution transmission electron microscopy techniques were used for the analysis of the surface roughness and surface crystal layer quality. The gas cluster ion beam processing results in surface relief smoothing down to average roughness about 1 nm for both elements. It was shown that xenon as the working gas is more effective: sputtering rate for xenon clusters is 2.5 times higher than for argon at the same beam energy. High resolution transmission electron microscopy analysis of the surface defect layer gives values of 7 ± 2 nm and 8 ± 2 nm for treatment with argon and xenon clusters.

  15. A model for sputtering from solid surfaces bombarded by energetic clusters

    Science.gov (United States)

    Benguerba, Messaoud

    2018-04-01

    A model is developed to explain and predict the sputtering from solid surfaces bombarded by energetic clusters, on the basis of shock wave generated at the impact of cluster. Under the shock compression the temperature increases causing the vaporization of material that requires an internal energy behind the shock, at least, of about twice the cohesive energy of target. The sputtering is treated as a gas of vaporized particles from a hemispherical volume behind the shock front. The sputter yield per cluster atoms is given as a universal function depending on the ratio of target to cluster atomic density and the ratio of cluster velocity to the velocity calculated on the basis of an internal energy equals about twice cohesive energy. The predictions of the model for self sputter yield of copper, gold, tungsten and of silver bombarded by C60 clusters agree well, with the corresponding data simulated by molecular dynamics.

  16. Cluster analysis of fruit and vegetable-related perceptions: an alternative approach of consumer segmentation.

    Science.gov (United States)

    Simunaniemi, A-M; Nydahl, M; Andersson, A

    2013-02-01

    Audience segmentation optimises health communication aimed to promote healthy dietary habits, such as fruit and vegetable (F&V) consumption. The present study aimed to segment respondents into clusters based on F&V-related perceptions, and to describe these clusters with respect to F&V consumption and sex. The cross-sectional study was conducted using a semi-structured questionnaire. The respondents were randomly selected among Swedish adults (n = 1304; response rate 51%; 56% women). A two-step cluster analysis was conducted followed by a binary logistic regression with cluster membership as a dependent variable. The clusters were compared using t-tests and chi-squared tests. P vegetables (both sexes) and fruit (women only), whereas men in the Indifferent cluster (n = 715) consumed more juice. Indifferent cluster reported more F&V consumption preventing factors, such as storage and preparation difficulties and low satisfaction with F&V selection and price. Not liking or not having a habit of F&V consumption, laziness, forgetting and a lack of time were mentioned as main barriers to F&V consumption. The Indifferent cluster reports more practical and life-style related difficulties. The Positive cluster consumes more vegetables, perceives fewer F&V-related difficulties, and looks for more dietary information. The findings confirm that cluster analysis is an appropriate way of identifying consumer subgroups for targeted health and nutrition communication. © 2012 The Authors. Journal of Human Nutrition and Dietetics © 2012 The British Dietetic Association Ltd.

  17. Factors associated with food choices among Greek primary school students: a cluster analysis in the ELPYDES study.

    Science.gov (United States)

    Risvas, Grigoris; Panagiotakos, Demosthenes B; Chrysanthopoulou, Stavroula; Karasouli, Konstantina; Matalas, Antonia-Leda; Zampelas, Antonis

    2008-09-01

    Food choice in Greece follows a westernized model. We tried to identify the characteristics of clusters regarding food choice and behaviour in a large sample of Greek primary school students, in order to acknowledge some mediating parameters that need to be addressed when planning interventions to promote healthy nutrition. Cross-sectional study in 2439 fifth and sixth grade students from the Attica and Thessaloniki regions. Three self-administered questionnaires were distributed assessing food consumption, nutrition knowledge and factors associated with dietary change. Data were analysed using principal components analysis (PCA) and K-means cluster analysis. A total of 28.4% (n = 592) of the students were identified as demonstrating 'unbalanced nutrition' whereas 44.8% (n = 1018) and 22.8% (n = 319) demonstrated 'balanced' and 'low food intake', respectively. With regards to nutrition knowledge, the clusters were as follows: medium (n = 319, 14.5%), good (n = 1788, 80.9%) and bad knowledge (n = 101, 4.57%) cluster. After analysing the results of PCA, three clusters were formed: A 'negative effect' (n = 561, 28.8%), a 'health oriented' (n = 777, 39.9%) and a 'reinforced' to eat fruits and vegetables (n = 506, 31.3%) group. The present study managed to identify clusters that correspond to food intake, nutrition knowledge and other factors associated with dietary behaviour and to describe their characteristics.

  18. Global myeloma research clusters, output, and citations: a bibliometric mapping and clustering analysis.

    Directory of Open Access Journals (Sweden)

    Jens Peter Andersen

    Full Text Available International collaborative research is a mechanism for improving the development of disease-specific therapies and for improving health at the population level. However, limited data are available to assess the trends in research output related to orphan diseases.We used bibliometric mapping and clustering methods to illustrate the level of fragmentation in myeloma research and the development of collaborative efforts. Publication data from Thomson Reuters Web of Science were retrieved for 2005-2009 and followed until 2013. We created a database of multiple myeloma publications, and we analysed impact and co-authorship density to identify scientific collaborations, developments, and international key players over time. The global annual publication volume for studies on multiple myeloma increased from 1,144 in 2005 to 1,628 in 2009, which represents a 43% increase. This increase is high compared to the 24% and 14% increases observed for lymphoma and leukaemia. The major proportion (>90% of publications was from the US and EU over the study period. The output and impact in terms of citations, identified several successful groups with a large number of intra-cluster collaborations in the US and EU. The US-based myeloma clusters clearly stand out as the most productive and highly cited, and the European Myeloma Network members exhibited a doubling of collaborative publications from 2005 to 2009, still increasing up to 2013.Multiple myeloma research output has increased substantially in the past decade. The fragmented European myeloma research activities based on national or regional groups are progressing, but they require a broad range of targeted research investments to improve multiple myeloma health care.

  19. Clinical Characteristics of Exacerbation-Prone Adult Asthmatics Identified by Cluster Analysis.

    Science.gov (United States)

    Kim, Mi Ae; Shin, Seung Woo; Park, Jong Sook; Uh, Soo Taek; Chang, Hun Soo; Bae, Da Jeong; Cho, You Sook; Park, Hae Sim; Yoon, Ho Joo; Choi, Byoung Whui; Kim, Yong Hoon; Park, Choon Sik

    2017-11-01

    Asthma is a heterogeneous disease characterized by various types of airway inflammation and obstruction. Therefore, it is classified into several subphenotypes, such as early-onset atopic, obese non-eosinophilic, benign, and eosinophilic asthma, using cluster analysis. A number of asthmatics frequently experience exacerbation over a long-term follow-up period, but the exacerbation-prone subphenotype has rarely been evaluated by cluster analysis. This prompted us to identify clusters reflecting asthma exacerbation. A uniform cluster analysis method was applied to 259 adult asthmatics who were regularly followed-up for over 1 year using 12 variables, selected on the basis of their contribution to asthma phenotypes. After clustering, clinical profiles and exacerbation rates during follow-up were compared among the clusters. Four subphenotypes were identified: cluster 1 was comprised of patients with early-onset atopic asthma with preserved lung function, cluster 2 late-onset non-atopic asthma with impaired lung function, cluster 3 early-onset atopic asthma with severely impaired lung function, and cluster 4 late-onset non-atopic asthma with well-preserved lung function. The patients in clusters 2 and 3 were identified as exacerbation-prone asthmatics, showing a higher risk of asthma exacerbation. Two different phenotypes of exacerbation-prone asthma were identified among Korean asthmatics using cluster analysis; both were characterized by impaired lung function, but the age at asthma onset and atopic status were different between the two. Copyright © 2017 The Korean Academy of Asthma, Allergy and Clinical Immunology · The Korean Academy of Pediatric Allergy and Respiratory Disease

  20. Method for exploratory cluster analysis and visualisation of single-trial ERP ensembles.

    Science.gov (United States)

    Williams, N J; Nasuto, S J; Saddy, J D

    2015-07-30

    The validity of ensemble averaging on event-related potential (ERP) data has been questioned, due to its assumption that the ERP is identical across trials. Thus, there is a need for preliminary testing for cluster structure in the data. We propose a complete pipeline for the cluster analysis of ERP data. To increase the signal-to-noise (SNR) ratio of the raw single-trials, we used a denoising method based on Empirical Mode Decomposition (EMD). Next, we used a bootstrap-based method to determine the number of clusters, through a measure called the Stability Index (SI). We then used a clustering algorithm based on a Genetic Algorithm (GA) to define initial cluster centroids for subsequent k-means clustering. Finally, we visualised the clustering results through a scheme based on Principal Component Analysis (PCA). After validating the pipeline on simulated data, we tested it on data from two experiments - a P300 speller paradigm on a single subject and a language processing study on 25 subjects. Results revealed evidence for the existence of 6 clusters in one experimental condition from the language processing study. Further, a two-way chi-square test revealed an influence of subject on cluster membership. Our analysis operates on denoised single-trials, the number of clusters are determined in a principled manner and the results are presented through an intuitive visualisation. Given the cluster structure in some experimental conditions, we suggest application of cluster analysis as a preliminary step before ensemble averaging. Copyright © 2015 Elsevier B.V. All rights reserved.

  1. Methodology сomparative statistical analysis of Russian industry based on cluster analysis

    Directory of Open Access Journals (Sweden)

    Sergey S. Shishulin

    2017-01-01

    Full Text Available The article is devoted to researching of the possibilities of applying multidimensional statistical analysis in the study of industrial production on the basis of comparing its growth rates and structure with other developed and developing countries of the world. The purpose of this article is to determine the optimal set of statistical methods and the results of their application to industrial production data, which would give the best access to the analysis of the result.Data includes such indicators as output, output, gross value added, the number of employed and other indicators of the system of national accounts and operational business statistics. The objects of observation are the industry of the countrys of the Customs Union, the United States, Japan and Erope in 2005-2015. As the research tool used as the simplest methods of transformation, graphical and tabular visualization of data, and methods of statistical analysis. In particular, based on a specialized software package (SPSS, the main components method, discriminant analysis, hierarchical methods of cluster analysis, Ward’s method and k-means were applied.The application of the method of principal components to the initial data makes it possible to substantially and effectively reduce the initial space of industrial production data. Thus, for example, in analyzing the structure of industrial production, the reduction was from fifteen industries to three basic, well-interpreted factors: the relatively extractive industries (with a low degree of processing, high-tech industries and consumer goods (medium-technology sectors. At the same time, as a result of comparison of the results of application of cluster analysis to the initial data and data obtained on the basis of the principal components method, it was established that clustering industrial production data on the basis of new factors significantly improves the results of clustering.As a result of analyzing the parameters of

  2. Trend analysis using non-stationary time series clustering based on the finite element method

    OpenAIRE

    Gorji Sefidmazgi, M.; Sayemuzzaman, M.; Homaifar, A.; Jha, M. K.; Liess, S.

    2014-01-01

    In order to analyze low-frequency variability of climate, it is useful to model the climatic time series with multiple linear trends and locate the times of significant changes. In this paper, we have used non-stationary time series clustering to find change points in the trends. Clustering in a multi-dimensional non-stationary time series is challenging, since the problem is mathematically ill-posed. Clustering based on the finite element method (FEM) is one of the methods ...

  3. Hyperplane distance neighbor clustering based on local discriminant analysis for complex chemical processes monitoring

    International Nuclear Information System (INIS)

    Lu, Chunhong; Xiao, Shaoqing; Gu, Xiaofeng

    2014-01-01

    The collected training data often include both normal and faulty samples for complex chemical processes. However, some monitoring methods, such as partial least squares (PLS), principal component analysis (PCA), independent component analysis (ICA) and Fisher discriminant analysis (FDA), require fault-free data to build the normal operation model. These techniques are applicable after the preliminary step of data clustering is applied. We here propose a novel hyperplane distance neighbor clustering (HDNC) based on the local discriminant analysis (LDA) for chemical process monitoring. First, faulty samples are separated from normal ones using the HDNC method. Then, the optimal subspace for fault detection and classification can be obtained using the LDA approach. The proposed method takes the multimodality within the faulty data into account, and thus improves the capability of process monitoring significantly. The HDNC-LDA monitoring approach is applied to two simulation processes and then compared with the conventional FDA based on the K-nearest neighbor (KNN-FDA) method. The results obtained in two different scenarios demonstrate the superiority of the HDNC-LDA approach in terms of fault detection and classification accuracy

  4. Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.

    Science.gov (United States)

    Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B

    2017-11-01

    Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  5. Investigating the health profile of patients with end-stage renal failure receiving peritoneal dialysis: a cluster analysis.

    Science.gov (United States)

    Chan, M F; Wong, Frances K Y; Chow, Susan K Y

    2010-03-01

    To determine whether the patients with end stage renal failure can be differentiated into several subtypes based on five main variables. There is a lack of interventional research linking to clinical outcomes among the patients with end stage renal failure in Hong Kong and with no clear evidence of differences in terms of their clinical/health outcomes and characteristics. A cross-sectional survey. Data were collected using a structured questionnaire. One hundred and fifty-three patients with end stage renal failure were recruited during 2007 at three renal centres in Hong Kong. Five main variables were employed: predisposing characteristic, enabling resources, quality of life, symptom control and self-care adherence. A cluster analysis yielded two clusters. Each cluster represented a different profile of patients with end stage renal failure. Cluster A consisted of 49.7% (n = 76) and Cluster B consisted of 50.3% (n = 77) of the patients. Cluster A patients, more of whom were women, were older, less educated, had higher quality of life scores, a better adherence rate and more had received nursing care supports than patients in Cluster B. We have identified two groupings of patients with end stage renal failure who were experiencing unique health profile. Nursing support services may have an effect on patient health outcomes but only on a group of patients whose profile is similar to the patients in Cluster A and not for patients in Cluster B. A clear profile may help health care professional make appropriate strategies to target a specific group of patients to improve patient outcomes. The identification of risk for future health-care use could enable better targeting of interventional strategies in these groups. The results of this study might provide health care professionals with a model to design specified interventions to improve life quality for each profile group.

  6. K-Line Patterns’ Predictive Power Analysis Using the Methods of Similarity Match and Clustering

    Directory of Open Access Journals (Sweden)

    Lv Tao

    2017-01-01

    Full Text Available Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1 the predictive power of a pattern varies a great deal for different shapes and (2 each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.

  7. Cluster Analysis of Acute Care Use Yields Insights for Tailored Pediatric Asthma Interventions.

    Science.gov (United States)

    Abir, Mahshid; Truchil, Aaron; Wiest, Dawn; Nelson, Daniel B; Goldstick, Jason E; Koegel, Paul; Lozon, Marie M; Choi, Hwajung; Brenner, Jeffrey

    2017-09-01

    We undertake this study to understand patterns of pediatric asthma-related acute care use to inform interventions aimed at reducing potentially avoidable hospitalizations. Hospital claims data from 3 Camden city facilities for 2010 to 2014 were used to perform cluster analysis classifying patients aged 0 to 17 years according to their asthma-related hospital use. Clusters were based on 2 variables: asthma-related ED visits and hospitalizations. Demographics and a number of sociobehavioral and use characteristics were compared across clusters. Children who met the criteria (3,170) were included in the analysis. An examination of a scree plot showing the decline in within-cluster heterogeneity as the number of clusters increased confirmed that clusters of pediatric asthma patients according to hospital use exist in the data. Five clusters of patients with distinct asthma-related acute care use patterns were observed. Cluster 1 (62% of patients) showed the lowest rates of acute care use. These patients were least likely to have a mental health-related diagnosis, were less likely to have visited multiple facilities, and had no hospitalizations for asthma. Cluster 2 (19% of patients) had a low number of asthma ED visits and onetime hospitalization. Cluster 3 (11% of patients) had a high number of ED visits and low hospitalization rates, and the highest rates of multiple facility use. Cluster 4 (7% of patients) had moderate ED use for both asthma and other illnesses, and high rates of asthma hospitalizations; nearly one quarter received care at all facilities, and 1 in 10 had a mental health diagnosis. Cluster 5 (1% of patients) had extreme rates of acute care use. Differences observed between groups across multiple sociobehavioral factors suggest these clusters may represent children who differ along multiple dimensions, in addition to patterns of service use, with implications for tailored interventions. Copyright © 2017 American College of Emergency Physicians

  8. Analysis of Factors and Development Potential of Economic Clusters by Economic Activities in Mari El Republic

    Directory of Open Access Journals (Sweden)

    Viktor Aleksandrovich Golovin

    2017-12-01

    Full Text Available This article analyzes the factors that drive the development of economic clusters in Mari El Republic (Russia. This analysis allowed to reveal the potential of those clusters further development. I consider a shift-share method as one of the major methods to identify the factors that determine the expansion of economic clusters. The author proposes the modification of shift-share method using relative performance indicators to evaluate the intensity and qualitaty of clustering processes in the region. The article presents the results of empirical research of the economy of Mari El Republic by shift-share method (2005–2015 years in the context of economic activities according to the Federal State Statistics Service. After the analysis of three basic indicators, the leading and lagging economic activities were revealed for the period of 10 years. I paid special attention to the analysis of clustering potential of the Mari El Republic in the context of economic activities based on the Clustering Potential Index. This analysis shows promising economic activities and industries that may form cluster. The author discusses the compliance and possible conflicts of two methods used in the study. Further research of this field can focus on t