Everitt, Brian S; Leese, Morven; Stahl, Daniel
2011-01-01
Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.Real life examples are used throughout to demons
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
Variable cluster analysis method for building neural network model
Institute of Scientific and Technical Information of China (English)
王海东; 刘元东
2004-01-01
To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.
Traffic Accident, System Model and Cluster Analysis in GIS
Directory of Open Access Journals (Sweden)
Veronika Vlčková
2015-07-01
Full Text Available One of the many often frequented topics as normal journalism, so the professional public, is the problem of traffic accidents. This article illustrates the orientation of considerations to a less known context of accidents, with the help of constructive systems theory and its methods, cluster analysis and geoinformation engineering. Traffic accident is reframing the space-time, and therefore it can be to study with tools of technology of geographic information systems. The application of system approach enabling the formulation of the system model, grabbed by tools of geoinformation engineering and multicriterial and cluster analysis.
Outlier Identification in Model-Based Cluster Analysis.
Evans, Katie; Love, Tanzy; Thurston, Sally W
2015-04-01
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.
Outlier Identification in Model-Based Cluster Analysis
Evans, Katie; Love, Tanzy; Thurston, Sally W.
2015-01-01
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data. PMID:26806993
Bayesian model-based cluster analysis for predicting macrofaunal communities
Braak, ter C.J.F.; Hoijtink, H.; Akkermans, W.; Verdonschot, P.F.M.
2003-01-01
To predict macrofaunal community composition from environmental data a two-step approach is often followed: (1) the water samples are clustered into groups on the basis of the macrofauna data and (2) the groups are related to the environmental data, e.g. by discriminant analysis. For the cluster ana
caBIG™ VISDA: Modeling, visualization, and discovery for cluster analysis of genomic data
Directory of Open Access Journals (Sweden)
Xuan Jianhua
2008-09-01
Full Text Available Abstract Background The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample
caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic data.
Zhu, Yitan; Li, Huai; Miller, David J; Wang, Zuyi; Xuan, Jianhua; Clarke, Robert; Hoffman, Eric P; Wang, Yue
2008-09-18
The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering
Transport Simulation Model Calibration with Two-Step Cluster Analysis Procedure
Directory of Open Access Journals (Sweden)
Zenina Nadezda
2015-12-01
Full Text Available The calibration results of transport simulation model depend on selected parameters and their values. The aim of the present paper is to calibrate a transport simulation model by a two-step cluster analysis procedure to improve the reliability of simulation model results. Two global parameters have been considered: headway and simulation step. Normal, uniform and exponential headway generation models have been selected for headway. Application of two-step cluster analysis procedure to the calibration procedure has allowed reducing time needed for simulation step and headway generation model value selection.
3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS
Directory of Open Access Journals (Sweden)
C. Zhang
2016-10-01
Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.
Cluster analysis in kinetic modelling of the brain: A noninvasive alternative to arterial sampling
DEFF Research Database (Denmark)
Liptrot, Matthew George; Adams, K.H.; Martiny, L.
2004-01-01
by the 'within-variance' measure and by 3D visual inspection of the homogeneity of the determined clusters. The cluster-determined input curve was then used in Logan plot analysis and compared with the arterial and venous blood samples, and additionally with one of the currently used alternatives to arterial...... acts as a proof-of-principle that the use of cluster analysis on a PET data set could obviate the requirement for arterial cannulation when determining the input function for kinetic modelling of ligand binding, and that this may be a superior approach as compared to the other noninvasive alternatives......) extracted directly from dynamic positron emission tomography (PET) scans by cluster analysis. Five healthy subjects were injected with the 5HT2A- receptor ligand [18F]-altanserin and blood samples were subsequently taken from the radial artery and cubital vein. Eight regions-of-interest (ROI) TACs were...
Cluster analysis for applications
Anderberg, Michael R
1973-01-01
Cluster Analysis for Applications deals with methods and various applications of cluster analysis. Topics covered range from variables and scales to measures of association among variables and among data units. Conceptual problems in cluster analysis are discussed, along with hierarchical and non-hierarchical clustering methods. The necessary elements of data analysis, statistics, cluster analysis, and computer implementation are integrated vertically to cover the complete path from raw data to a finished analysis.Comprised of 10 chapters, this book begins with an introduction to the subject o
Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach
Witek, Ewa
The model-based approach assumes that data is generated by a finite mixture of probability distributions such as multivariate normal distributions. In finite mixture models, each component of probability distribution corresponds to a cluster. The problem of determining the number of clusters and choosing an appropriate clustering method becomes the problem of statistical model choice. Hence, the model-based approach provides a key advantage over heuristic clustering algorithms, because it selects both the correct model and the number of clusters.
Cluster banding heat source model
Institute of Scientific and Technical Information of China (English)
Zhang Liguo; Ji Shude; Yang Jianguo; Fang Hongyuan; Li Yafan
2006-01-01
Concept of cluster banding heat source model is put forward for the problem of overmany increment steps in the process of numerical simulation of large welding structures, and expression of cluster banding heat source model is deduced based on energy conservation law.Because the expression of cluster banding heat source model deduced is suitable for random weld width, quantitative analysis of welding stress field for large welding structures which have regular welds can be made quickly.
Marketing research cluster analysis
Directory of Open Access Journals (Sweden)
Marić Nebojša
2002-01-01
Full Text Available One area of applications of cluster analysis in marketing is identification of groups of cities and towns with similar demographic profiles. This paper considers main aspects of cluster analysis by an example of clustering 12 cities with the use of Minitab software.
Directory of Open Access Journals (Sweden)
Lingli Jiang
2011-01-01
Full Text Available This paper proposes a new approach combining autoregressive (AR model and fuzzy cluster analysis for bearing fault diagnosis and degradation assessment. AR model is an effective approach to extract the fault feature, and is generally applied to stationary signals. However, the fault vibration signals of a roller bearing are non-stationary and non-Gaussian. Aiming at this problem, the set of parameters of the AR model is estimated based on higher-order cumulants. Consequently, the AR parameters are taken as the feature vectors, and fuzzy cluster analysis is applied to perform classification and pattern recognition. Experiments analysis results show that the proposed method can be used to identify various types and severities of fault bearings. This study is significant for non-stationary and non-Gaussian signal analysis, fault diagnosis and degradation assessment.
Lawson, Andrew B
2002-01-01
Research has generated a number of advances in methods for spatial cluster modelling in recent years, particularly in the area of Bayesian cluster modelling. Along with these advances has come an explosion of interest in the potential applications of this work, especially in epidemiology and genome research. In one integrated volume, this book reviews the state-of-the-art in spatial clustering and spatial cluster modelling, bringing together research and applications previously scattered throughout the literature. It begins with an overview of the field, then presents a series of chapters that illuminate the nature and purpose of cluster modelling within different application areas, including astrophysics, epidemiology, ecology, and imaging. The focus then shifts to methods, with discussions on point and object process modelling, perfect sampling of cluster processes, partitioning in space and space-time, spatial and spatio-temporal process modelling, nonparametric methods for clustering, and spatio-temporal ...
Cluster Correspondence Analysis
M. van de Velden (Michel); A. Iodice D' Enza; F. Palumbo
2014-01-01
markdownabstract__Abstract__ A new method is proposed that combines dimension reduction and cluster analysis for categorical data. A least-squares objective function is formulated that approximates the cluster by variables cross-tabulation. Individual observations are assigned to clusters
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.
2008-01-01
Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Institute of Scientific and Technical Information of China (English)
YANG Xing-long; REN Ya-tong
2012-01-01
Using Michael Porter’s "diamond model", based on regional development characteristics, we conduct analysis of the competitiveness of processing industry cluster of livestock products in Inner Mongolia from six aspects (the factor conditions, demand conditions, corporate strategy, structure and competition, related and supporting industries, government and opportunities). And we put forward the following rational recommendations for improving the competitiveness of processing industry cluster of livestock products in Inner Mongolia: (i) The government should increase capital input, focus on supporting processing industry of livestock products, and give play to the guidance and aggregation effect of financial funds; (ii) In terms of enterprises, it is necessary to vigorously develop leading enterprises, to give full play to the cluster effect of the leading enterprises.
CLEAN: CLustering Enrichment ANalysis
Directory of Open Access Journals (Sweden)
Medvedovic Mario
2009-07-01
Full Text Available Abstract Background Integration of biological knowledge encoded in various lists of functionally related genes has become one of the most important aspects of analyzing genome-wide functional genomics data. In the context of cluster analysis, functional coherence of clusters established through such analyses have been used to identify biologically meaningful clusters, compare clustering algorithms and identify biological pathways associated with the biological process under investigation. Results We developed a computational framework for analytically and visually integrating knowledge-based functional categories with the cluster analysis of genomics data. The framework is based on the simple, conceptually appealing, and biologically interpretable gene-specific functional coherence score (CLEAN score. The score is derived by correlating the clustering structure as a whole with functional categories of interest. We directly demonstrate that integrating biological knowledge in this way improves the reproducibility of conclusions derived from cluster analysis. The CLEAN score differentiates between the levels of functional coherence for genes within the same cluster based on their membership in enriched functional categories. We show that this aspect results in higher reproducibility across independent datasets and produces more informative genes for distinguishing different sample types than the scores based on the traditional cluster-wide analysis. We also demonstrate the utility of the CLEAN framework in comparing clusterings produced by different algorithms. CLEAN was implemented as an add-on R package and can be downloaded at http://Clusteranalysis.org. The package integrates routines for calculating gene specific functional coherence scores and the open source interactive Java-based viewer Functional TreeView (FTreeView. Conclusion Our results indicate that using the gene-specific functional coherence score improves the reproducibility of the
Cluster Correspondence Analysis.
van de Velden, M; D'Enza, A Iodice; Palumbo, F
2017-03-01
A method is proposed that combines dimension reduction and cluster analysis for categorical data by simultaneously assigning individuals to clusters and optimal scaling values to categories in such a way that a single between variance maximization objective is achieved. In a unified framework, a brief review of alternative methods is provided and we show that the proposed method is equivalent to GROUPALS applied to categorical data. Performance of the methods is appraised by means of a simulation study. The results of the joint dimension reduction and clustering methods are compared with the so-called tandem approach, a sequential analysis of dimension reduction followed by cluster analysis. The tandem approach is conjectured to perform worse when variables are added that are unrelated to the cluster structure. Our simulation study confirms this conjecture. Moreover, the results of the simulation study indicate that the proposed method also consistently outperforms alternative joint dimension reduction and clustering methods.
Supercomputer and cluster performance modeling and analysis efforts:2004-2006.
Energy Technology Data Exchange (ETDEWEB)
Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold (Hal) Edward; Stevenson, Joel O.; Benner, Robert E., Jr. (.,; .); Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude
2007-02-01
This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.
Directory of Open Access Journals (Sweden)
O. A. Tarasova
2007-08-01
Full Text Available Important aspects of the seasonal variations of surface ozone are discussed. The underlying analysis is based on the long-term (1990–2004 ozone records of Co-operative Programme for Monitoring and Evaluation of the Long-range Transmission of Air Pollutants in Europe (EMEP and the World Data Center of Greenhouse Gases which do have a strong Northern Hemisphere bias. Seasonal variations are pronounced at most of the 114 locations for any time of the day. Seasonal-diurnal variability classification using hierarchical agglomeration clustering reveals 5 distinct clusters: clean/rural, semi-polluted non-elevated, semi-polluted semi-elevated, elevated and polar/remote marine types. For the cluster "clean/rural" the seasonal maximum is observed in April, both for night and day. For those sites with a double maximum or a wide spring-summer maximum, the one in spring appears both for day and night, while the one in summer is more pronounced for daytime and hence can be attributed to photochemical processes. For the spring maximum photochemistry is a less plausible explanation as no dependence of the maximum timing is observed. More probably the spring maximum is caused by dynamical/transport processes. Using data from the 3-D atmospheric chemistry general circulation model ECHAM5/MESSy1 covering the period of 1998–2005 a comparison has been performed for the identified clusters. For the model data four distinct classes of variability are detected. The majority of cases are covered by the regimes with a spring seasonal maximum or with a broad spring-summer maximum (with prevailing summer. The regime with winter–early spring maximum is reproduced by the model for southern hemispheric locations. Background and semi-polluted sites appear in the model in the same cluster. The seasonality in this model cluster is characterized by a pronounced spring (May maximum. For the model cluster that covers partly semi-elevated semi-polluted sites the role of the
Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable
Ma, Jinhui; Raina, Parminder; Beyene, Joseph; Thabane, Lehana
2013-01-23
The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses. In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI) and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE), and coverage probability. GEE performs well on all four measures--provided the downward bias of the standard error (when the number of clusters per arm is small) is adjusted appropriately--under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF) cluster MI for CRTs with VIF≥3 and cluster size>50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.
Directory of Open Access Journals (Sweden)
O. A. Tarasova
2007-12-01
Full Text Available Important aspects of the seasonal variations of surface ozone are discussed. The underlying analysis is based on the long-term (1990–2004 ozone records of the Co-operative Programme for Monitoring and Evaluation of the Long-range Transmission of Air Pollutants in Europe (EMEP and the World Data Centre of Greenhouse Gases, which provide data mostly for the Northern Hemisphere. Seasonal variations are pronounced at most of the 114 locations at all times of the day. A seasonal-diurnal variations classification using hierarchical agglomeration clustering reveals 6 distinct clusters: clean background, rural, semi-polluted non-elevated, semi-polluted semi-elevated, elevated and polar/remote marine. For the "clean background" cluster the seasonal maximum is observed in March-April, both for night and day. For those sites with a double maximum or a wide spring-summer maximum, the spring maximum appears both for day and night, while the summer maximum is more pronounced for daytime and hence can be attributed to photochemical processes. The spring maximum is more likely caused by dynamical/transport processes than by photochemistry as it is observed in spring for all times of the day. We compare the identified clusters with corresponding data from the 3-D atmospheric chemistry general circulation model ECHAM5/MESSy1 covering the period of 1998–2005. For the model output as for the measurements 6 clusters are considered. The simulation shows at most of the sites a spring seasonal maximum or a broad spring-summer maximum (with higher summer mixing ratios. For southern hemispheric and polar remote locations the seasonal maximum in the simulation is shifted to spring, while the absolute mixing ratios are in good agreement with the measurements. The seasonality in the model cluster covering background locations is characterized by a pronounced spring (April–May maximum. For the model clusters which cover rural and semi-polluted sites the role of the
Analysis of Decision Trees in Context Clustering of Hidden Markov Model Based Thai Speech Synthesis
Directory of Open Access Journals (Sweden)
Suphattharachai Chomphan
2011-01-01
Full Text Available Problem statement: In Thai speech synthesis using Hidden Markov model (HMM based synthesis system, the tonal speech quality is degraded due to tone distortion. This major problem must be treated appropriately to preserve the tone characteristics of each syllable unit. Since tone brings about the intelligibility of the synthesized speech. It is needed to establish the tone questions and other phonetic questions in tree-based context clustering process accordingly. Approach: This study describes the analysis of questions in tree-based context clustering process of an HMM-based speech synthesis system for Thai language. In the system, spectrum, pitch or F0 and state duration are modeled simultaneously in a unified framework of HMM, their parameter distributions are clustered independently by using a decision-tree based context clustering technique. The contextual factors which affect spectrum, pitch and duration, i.e., part of speech, position and number of phones in a syllable, position and number of syllables in a word, position and number of words in a sentence, phone type and tone type, are taken into account for constructing the questions of the decision tree. All in all, thirteen sets of questions are analyzed in comparison. Results: In the experiment, we analyzed the decision trees by counting the number of questions in each node coming from those thirteen sets and by calculating the dominance score given to each question as the reciprocal of the distance from the root node to the question node. The highest number and dominance score are of the set of phonetic type, while the second, third highest ones are of the set of part of speech and tone type. Conclusion: By counting the number of questions in each node and calculating the dominance score, we can set the priority of each question set. All in all, the analysis results bring about further development of Thai speech synthesis with efficient context clustering process in
Modeling, Stability Analysis and Active Stabilization of Multiple DC-Microgrids Clusters
DEFF Research Database (Denmark)
Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos
2014-01-01
), and more especially during interconnection with other MGs, creating dc MG clusters. This paper develops a small signal model for dc MGs from the control point of view, in order to study stability analysis and investigate effects of CPLs and line impedances between the MGs on stability of these systems......DC microgrids (MGs), as an alternative option, have attracted increasing interest in recent years due to many potential advantages as compare to the ac system. Stability of these systems can be an important issue under high penetration of load converters which behaves as constant power loads (CPLs....... This model can be also used to synthesis and study dynamics of control loops in dc MGs and also dc MG clusters. An active stabilization method is proposed to be implemented as a dc active power filter (APF) inside the MGs in order to not only increase damping of dc MGs at the presence of CPLs but also...
Continuum modeling of myxobacteria clustering
Harvey, Cameron W.; Alber, Mark; Tsimring, Lev S.; Aranson, Igor S.
2013-03-01
In this paper we develop a continuum theory of clustering in ensembles of self-propelled inelastically colliding rods with applications to collective dynamics of common gliding bacteria Myxococcus xanthus. A multi-phase hydrodynamic model that couples densities of oriented and isotropic phases is described. This model is used for the analysis of an instability that leads to spontaneous formation of directionally moving dense clusters within initially dilute isotropic ‘gas’ of myxobacteria. Numerical simulations of this model confirm the existence of stationary dense moving clusters and also elucidate the properties of their collisions. The results are shown to be in a qualitative agreement with experiments.
APROACHES TOWARDS CLUSTER ANALYSIS
National Research Council Canada - National Science Library
Manuela Tvaronaviciene; Kristina Razminiene; Leonardo Piccinetti
2015-01-01
.... The findings indicate that case study is used in many articles refering to cluster research. Other methods, such as analysis, interview, survey, research, equation and others are used to support case study...
Directory of Open Access Journals (Sweden)
Ma Jinhui
2013-01-01
Full Text Available Abstracts Background The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE and cluster-specific (i.e. random-effects logistic regression (RELR models for analyzing data from cluster randomized trials (CRTs with missing binary responses. Methods In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE, and coverage probability. Results GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF 50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied. Conclusion GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.
Al-Jabery, Khalid; Obafemi-Ajayi, Tayo; Olbricht, Gayla R; Takahashi, T Nicole; Kanne, Stephen; Wunsch, Donald
2016-08-01
Heterogeneity in Autism Spectrum Disorder (ASD) is complex including variability in behavioral phenotype as well as clinical, physiologic, and pathologic parameters. The fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) now diagnoses ASD using a 2-dimensional model based social communication deficits and fixated interests and repetitive behaviors. Sorting out heterogeneity is crucial for study of etiology, diagnosis, treatment and prognosis. In this paper, we present an ensemble model for analyzing ASD phenotypes using several machine learning techniques and a k-dimensional subspace clustering algorithm. Our ensemble also incorporates statistical methods at several stages of analysis. We apply this model to a sample of 208 probands drawn from the Simon Simplex Collection Missouri Site patients. The results provide useful evidence that is helpful in elucidating the phenotype complexity within ASD. Our model can be extended to other disorders that exhibit a diverse range of heterogeneity.
Cluster Based Text Classification Model
DEFF Research Database (Denmark)
2011-01-01
We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases th...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....
Firdausiah Mansur, Andi Besse; Yusof, Norazah
2013-01-01
Clustering on Social Learning Network still not explored widely, especially when the network focuses on e-learning system. Any conventional methods are not really suitable for the e-learning data. SNA requires content analysis, which involves human intervention and need to be carried out manually. Some of the previous clustering techniques need…
ANALYSIS OF TUITION GROWTH RATES BASED ON CLUSTERING AND REGRESSION MODELS
Directory of Open Access Journals (Sweden)
Long Cheng
2016-07-01
Full Text Available Tuition plays a significant role in determining whether a student could afford higher education, which is one of the major driving forces for country development and social prosperity. So it is necessary to fully understand what factors might affect the tuition and how they affect it. However, many existing studies on the tuition growth rate either lack sufficient real data and proper quantitative models to support their conclusions, or are limited to focus on only a few factors that might affect the tuition growth rate, failing to make a comprehensive analysis. In this paper, we explore a wide variety of factors that might affect the tuition growth rate by use of large amounts of authentic data and different quantitative methods such as clustering and regression models.
Comprehensive cluster analysis with Transitivity Clustering.
Wittkop, Tobias; Emig, Dorothea; Truss, Anke; Albrecht, Mario; Böcker, Sebastian; Baumbach, Jan
2011-03-01
Transitivity Clustering is a method for the partitioning of biological data into groups of similar objects, such as genes, for instance. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering is accessible online and offers three user-friendly interfaces: a powerful stand-alone version, a web interface, and a collection of Cytoscape plug-ins. In this paper, we describe three major workflows: (i) protein (super)family detection with Cytoscape, (ii) protein homology detection with incomplete gold standards and (iii) clustering of gene expression data. This protocol guides the user through the most important features of Transitivity Clustering and takes ∼1 h to complete.
A tri-stage cluster identification model for accurate analysis of seismic catalogs
Directory of Open Access Journals (Sweden)
S. J. Nanda
2013-02-01
Full Text Available In this paper we propose a tri-stage cluster identification model that is a combination of a simple single iteration distance algorithm and an iterative K-means algorithm. In this study of earthquake seismicity, the model considers event location, time and magnitude information from earthquake catalog data to efficiently classify events as either background or mainshock and aftershock sequences. Tests on a synthetic seismicity catalog demonstrate the efficiency of the proposed model in terms of accuracy percentage (94.81% for background and 89.46% for aftershocks. The close agreement between lambda and cumulative plots for the ideal synthetic catalog and that generated by the proposed model also supports the accuracy of the proposed technique. There is flexibility in the model design to allow for proper selection of location and magnitude ranges, depending upon the nature of the mainshocks present in the catalog. The effectiveness of the proposed model also is evaluated by the classification of events in three historic catalogs: California, Japan and Indonesia. As expected, for both synthetic and historic catalog analysis it is observed that the density of events classified as background is almost uniform throughout the region, whereas the density of aftershock events are higher near the mainshocks.
FORMATION OF A INNOVATION REGIONAL CLUSTER MODEL
Directory of Open Access Journals (Sweden)
G. S. Merzlikina
2015-01-01
Full Text Available Summary. As a result of investigation of science and methodical approaches related problems of building and development of innovation clusters there were some issues in functional assignments of innovation and production clusters. Because of those issues, article’s authors differ conceptions of innovation cluster and production cluster, as they explain notion of innovation-production cluster. The main goal of this article is to reveal existing organizational issues in cluster building and its successful development. Based on regional clusters building analysis carried out there was typical practical structure of cluster members interaction revealed. This structure also have its cons, as following: absence cluster orientation to marketing environment, lack of members’ prolonged relations’ building and development system, along with ineffective management of information, financial and material streams within cluster, narrow competence difference and responsibility zones between cluster members, lack of transparence of cluster’s action, low environment changes adaptivity, hard to use cluster members’ intellectual property, and commercialization of hi-tech products. When all those issues listed above come together, it reduces life activity of existing models of innovative cluster-building along with practical opportunity of cluster realization. Because of that, authors offer an upgraded innovative-productive cluster building model with more efficient business processes management system, which includes advanced innovative cluster structure, competence matrix and subcluster responsibility zone. Suggested model differs from other ones by using unified innovative product development control center, which also controls production and marketing realization.
DEFF Research Database (Denmark)
Thomsen, C E; Rosenfalck, A; Nørregaard Christensen, K
1991-01-01
. The method applied autoregressive modelling of the signal, segmented in 2 s fixed intervals. The features from the EEG segments were used for learning and for classification. The learning process was unsupervised and hierarchical clustering analysis was used to construct a learning set of EEG amplitude......-frequency patterns for each of the three anaesthetic drugs. These EEG patterns were assigned to a colour code corresponding to similar clinical states. A common learning set could be used for all patients anaesthetized with the same drug. The classification process could be performed on-line and the results were......The brain activity electroencephalogram (EEG) was recorded from 30 healthy women scheduled for hysterectomy. The patients were anaesthetized with isoflurane, halothane or etomidate/fentanyl. A multiparametric method was used for extraction of amplitude and frequency information from the EEG...
Cluster analysis in kinetic modelling of the brain: A noninvasive alternative to arterial sampling
DEFF Research Database (Denmark)
Liptrot, Matthew George; Adams, K.H.; Martiny, L.
2004-01-01
In emission tomography, quantification of brain tracer uptake, metabolism or binding requires knowledge of the cerebral input function. Traditionally, this is achieved with arterial blood sampling. We propose a noninvasive alternative via the use of a blood vessel time-activity curve (TAC......) extracted directly from dynamic positron emission tomography (PET) scans by cluster analysis. Five healthy subjects were injected with the 5HT2A- receptor ligand [18F]-altanserin and blood samples were subsequently taken from the radial artery and cubital vein. Eight regions-of-interest (ROI) TACs were...... extracted from the PET data set. Hierarchical K-means cluster analysis was performed on the PET time series to extract a cerebral vasculature ROI. The number of clusters was varied from K = 1 to 10 for the second of the two-stage method. Determination of the correct number of clusters was performed...
Potts Model with Invisible Colors : Random-Cluster Representation and Pirogov–Sinai Analysis
Enter, Aernout C.D. van; Iacobelli, Giulio; Taati, Siamak
We study a recently introduced variant of the ferromagnetic Potts model consisting of a ferromagnetic interaction among q “visible” colors along with the presence of r non-interacting “invisible” colors. We introduce a random-cluster representation for the model, for which we prove the existence of
Baraldi, Andrea; Parmiggiani, Flavio
1996-06-01
According to the following definition, taken from the literature, a fuzzy clustering mechanism allows the same input pattern to belong to multiple categories to different degrees. Many clustering neural network (NN) models claim to feature fuzzy properties, but several of them (like the Fuzzy ART model) do not satisfy this definition. Vice versa, we believe that Kohonen's Self-Organizing Map, SOM, satisfies the definition provided above, even though this NN model is well-known to (robustly) perform topologically ordered mapping rather than fuzzy clustering. This may sound as a paradox if we consider that several fuzzy NN models (such as the Fuzzy Learning Vector Quantization, FLVQ, which was first called Fuzzy Kohonen Clustering Network, FKCN) were originally developed to enhance Kohonen's models (such as SOM and the vector quantization model, VQ). The fuzziness of SOM indicates that a network of processing elements (PEs) can verify the fuzzy clustering definition when it exploits local rules which are biologically plausible (such as the Kohonen bubble strategy). This is equivalent to state that the exploitation of the fuzzy set theory in the development of complex systems (e.g., clustering NNs) may provide new mathematical tools (e.g., the definition of membership function) to simulate the behavior of those cooperative/competitive mechanisms already identified by neurophysiological studies. When a biologically plausible cooperative/competitive strategy is pursued effectively, neighboring PEs become mutually coupled to gain sensitivity to contextual effects. PEs which are mutually coupled are affected by vertical (inter-layer) as well as horizontal (intra-layer) connections. To summarize, we suggest to relate the study of fuzzy clustering mechanisms to the multi-disciplinary science of complex systems, with special regard to the investigation of the cooperative/competitive local rules employed by complex systems to gain sensitivity to contextual effects in
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
Clustering Pre-equilibrium Model Analysis for Nucleon-induced Alpha-particle Spectra up to 200 MeV
Directory of Open Access Journals (Sweden)
Watanabe Y.
2012-02-01
Full Text Available The clustering exciton model of Iwamoto and Harada is applied to the analysis of pre-equilibrium (N, xα energy spectra for medium-to-heavy nuclei up to 200 MeV. In this work, we calculate alpha-particle formation factors without any approximations that appear in the original model. The clustering process is also considered in both the primary and second pre-equilibrium emissions. We optimize the exciton and the clustering model parameters simultaneously by looking at the experimental (N, xN and (N, xα energy spectra. The experimental alpha-particle spectra are well reproduced with a unique set of clustering model parameters, which is independent of incident neutrons/protons. The present analysis also implies that the clustering model parameter is not so different between the medium and heavy nuclei. Our calculations reproduce experimental data generally well up to the incident energy of ~150 MeV, but underestimations are seen above this energy.
Lin, Shih-Yen; Liu, Chih-Wei
2014-01-01
This study combines cluster analysis and LRFM (length, recency, frequency, and monetary) model in a pediatric dental clinic in Taiwan to analyze patients' values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients' needs. PMID:25045741
Directory of Open Access Journals (Sweden)
Hsin-Hung Wu
2014-01-01
Full Text Available This study combines cluster analysis and LRFM (length, recency, frequency, and monetary model in a pediatric dental clinic in Taiwan to analyze patients’ values. A two-stage approach by self-organizing maps and K-means method is applied to segment 1,462 patients into twelve clusters. The average values of L, R, and F excluding monetary covered by national health insurance program are computed for each cluster. In addition, customer value matrix is used to analyze customer values of twelve clusters in terms of frequency and monetary. Customer relationship matrix considering length and recency is also applied to classify different types of customers from these twelve clusters. The results show that three clusters can be classified into loyal patients with L, R, and F values greater than the respective average L, R, and F values, while three clusters can be viewed as lost patients without any variable above the average values of L, R, and F. When different types of patients are identified, marketing strategies can be designed to meet different patients’ needs.
Directory of Open Access Journals (Sweden)
Yihang Yin
2015-08-01
Full Text Available Wireless sensor networks (WSNs have been widely used to monitor the environment, and sensors in WSNs are usually power constrained. Because inner-node communication consumes most of the power, efficient data compression schemes are needed to reduce the data transmission to prolong the lifetime of WSNs. In this paper, we propose an efficient data compression model to aggregate data, which is based on spatial clustering and principal component analysis (PCA. First, sensors with a strong temporal-spatial correlation are grouped into one cluster for further processing with a novel similarity measure metric. Next, sensor data in one cluster are aggregated in the cluster head sensor node, and an efficient adaptive strategy is proposed for the selection of the cluster head to conserve energy. Finally, the proposed model applies principal component analysis with an error bound guarantee to compress the data and retain the definite variance at the same time. Computer simulations show that the proposed model can greatly reduce communication and obtain a lower mean square error than other PCA-based algorithms.
Ferguson, R. E.
1985-01-01
The data base verification of the ECLS Systems Assessment Program (ESAP) was documented and changes made to enhance the flexibility of the water recovery subsystem simulations are given. All changes which were made to the data base values are described and the software enhancements performed. The refined model documented herein constitutes the submittal of the General Cluster Systems Model. A source listing of the current version of ESAP is provided in Appendix A.
Evaluating Mixture Modeling for Clustering: Recommendations and Cautions
Steinley, Douglas; Brusco, Michael J.
2011-01-01
This article provides a large-scale investigation into several of the properties of mixture-model clustering techniques (also referred to as latent class cluster analysis, latent profile analysis, model-based clustering, probabilistic clustering, Bayesian classification, unsupervised learning, and finite mixture models; see Vermunt & Magdison,…
Remote sensing clustering analysis based on object-based interval modeling
He, Hui; Liang, Tianheng; Hu, Dan; Yu, Xianchuan
2016-09-01
In object-based clustering, image data are segmented into objects (groups of pixels) and then clustered based on the objects' features. This method can be used to automatically classify high-resolution, remote sensing images, but requires accurate descriptions of object features. In this paper, we ascertain that interval-valued data model is appropriate for describing clustering prototype features. With this in mind, we developed an object-based interval modeling method for high-resolution, multiband, remote sensing data. We also designed an adaptive interval-valued fuzzy clustering method. We ran experiments utilizing images from the SPOT-5 satellite sensor, for the Pearl River Delta region and Beijing. The results indicate that the proposed algorithm considers both the anisotropy of the remote sensing data and the ambiguity of objects. Additionally, we present a new dissimilarity measure for interval vectors, which better separates the interval vectors generated by features of the segmentation units (objects). This approach effectively limits classification errors caused by spectral mixing between classes. Compared with the object-based unsupervised classification method proposed earlier, the proposed algorithm improves the classification accuracy without increasing computational complexity.
Lucreţia Udrescu; Laura Sbârcea; Alexandru Topîrceanu; Alexandru Iovanovici; Ludovic Kurunczi; Paul Bogdan; Mihai Udrescu
2016-01-01
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection...
The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering
Schaefer, Andreas; Daniell, James; Wenzel, Friedemann
2016-04-01
Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in
Nielsen, J D; Dean, C B
2008-09-01
A flexible semiparametric model for analyzing longitudinal panel count data arising from mixtures is presented. Panel count data refers here to count data on recurrent events collected as the number of events that have occurred within specific follow-up periods. The model assumes that the counts for each subject are generated by mixtures of nonhomogeneous Poisson processes with smooth intensity functions modeled with penalized splines. Time-dependent covariate effects are also incorporated into the process intensity using splines. Discrete mixtures of these nonhomogeneous Poisson process spline models extract functional information from underlying clusters representing hidden subpopulations. The motivating application is an experiment to test the effectiveness of pheromones in disrupting the mating pattern of the cherry bark tortrix moth. Mature moths arise from hidden, but distinct, subpopulations and monitoring the subpopulation responses was of interest. Within-cluster random effects are used to account for correlation structures and heterogeneity common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the finite sample properties of the proposed estimating functions are investigated empirically by simulation.
Chattopadhyay, Souradeep; Maitra, Ranjan
2017-08-01
Clustering methods are an important tool to enumerate and describe the different coherent kind of gamma-ray bursts (GRBs). But their performance can be affected by a number of factors such as the choice of clustering algorithm and inherent associated assumptions, the inclusion of variables in clustering, nature of initialization methods used or the iterative algorithm or the criterion used to judge the optimal number of groups supported by the data. We analysed GRBs from the Burst and Transient Source Experiment (BATSE) 4Br Catalog using k-means and Gaussian-mixture-models-based clustering methods and found that after accounting for all the above factors, all six variables - different subsets of which have been used in the literature - that are, namely, the flux duration variables (T50, T90), the peak flux (P256) measured in 256 ms bins, the total fluence (Ft) and the spectral hardness ratios (H32 and H321) contain information on clustering. Further, our analysis found evidence of five different kinds of GRBs and that these groups have different kinds of dispersions in terms of shape, size and orientation. In terms of duration, fluence and spectrum, the five types of GRBs were characterized as intermediate/faint/intermediate, long/intermediate/soft, intermediate/intermediate/intermediate, short/faint/hard and long/bright/intermediate.
Integrative cluster analysis in bioinformatics
Abu-Jamous, Basel; Nandi, Asoke K
2015-01-01
Clustering techniques are increasingly being put to use in the analysis of high-throughput biological datasets. Novel computational techniques to analyse high throughput data in the form of sequences, gene and protein expressions, pathways, and images are becoming vital for understanding diseases and future drug discovery. This book details the complete pathway of cluster analysis, from the basics of molecular biology to the generation of biological knowledge. The book also presents the latest clustering methods and clustering validation, thereby offering the reader a comprehensive review o
Garcia, Danilo; MacDonald, Shane; Archer, Trevor
2015-01-01
Background. The notion of the affective system as being composed of two dimensions led Archer and colleagues to the development of the affective profiles model. The model consists of four different profiles based on combinations of individuals' experience of high/low positive and negative affect: self-fulfilling, low affective, high affective, and self-destructive. During the past 10 years, an increasing number of studies have used this person-centered model as the backdrop for the investigation of between and within individual differences in ill-being and well-being. The most common approach to this profiling is by dividing individuals' scores of self-reported affect using the median of the population as reference for high/low splits. However, scores just-above and just-below the median might become high and low by arbitrariness, not by reality. Thus, it is plausible to criticize the validity of this variable-oriented approach. Our aim was to compare the median splits approach with a person-oriented approach, namely, cluster analysis. Method. The participants (N = 2, 225) were recruited through Amazons' Mechanical Turk and asked to self-report affect using the Positive Affect Negative Affect Schedule. We compared the profiles' homogeneity and Silhouette coefficients to discern differences in homogeneity and heterogeneity between approaches. We also conducted exact cell-wise analyses matching the profiles from both approaches and matching profiles and gender to investigate profiling agreement with respect to affectivity levels and affectivity and gender. All analyses were conducted using the ROPstat software. Results. The cluster approach (weighted average of cluster homogeneity coefficients = 0.62, Silhouette coefficients = 0.68) generated profiles with greater homogeneity and more distinctive from each other compared to the median splits approach (weighted average of cluster homogeneity coefficients = 0.75, Silhouette coefficients = 0.59). Most of the
Directory of Open Access Journals (Sweden)
Utro Filippo
2008-10-01
Full Text Available Abstract Background Inferring cluster structure in microarray datasets is a fundamental task for the so-called -omic sciences. It is also a fundamental question in Statistics, Data Analysis and Classification, in particular with regard to the prediction of the number of clusters in a dataset, usually established via internal validation measures. Despite the wealth of internal measures available in the literature, new ones have been recently proposed, some of them specifically for microarray data. Results We consider five such measures: Clest, Consensus (Consensus Clustering, FOM (Figure of Merit, Gap (Gap Statistics and ME (Model Explorer, in addition to the classic WCSS (Within Cluster Sum-of-Squares and KL (Krzanowski and Lai index. We perform extensive experiments on six benchmark microarray datasets, using both Hierarchical and K-means clustering algorithms, and we provide an analysis assessing both the intrinsic ability of a measure to predict the correct number of clusters in a dataset and its merit relative to the other measures. We pay particular attention both to precision and speed. Moreover, we also provide various fast approximation algorithms for the computation of Gap, FOM and WCSS. The main result is a hierarchy of those measures in terms of precision and speed, highlighting some of their merits and limitations not reported before in the literature. Conclusion Based on our analysis, we draw several conclusions for the use of those internal measures on microarray data. We report the main ones. Consensus is by far the best performer in terms of predictive power and remarkably algorithm-independent. Unfortunately, on large datasets, it may be of no use because of its non-trivial computer time demand (weeks on a state of the art PC. FOM is the second best performer although, quite surprisingly, it may not be competitive in this scenario: it has essentially the same predictive power of WCSS but it is from 6 to 100 times slower in time
Cluster analysis of multiple planetary flow regimes
Mo, Kingtse; Ghil, Michael
1988-01-01
A modified cluster analysis method developed for the classification of quasi-stationary events into a few planetary flow regimes and for the examination of transitions between these regimes is described. The method was applied first to a simple deterministic model and then to a 500-mbar data set for Northern Hemisphere (NH), for which cluster analysis was carried out in the subspace of the first seven empirical orthogonal functions (EOFs). Stationary clusters were found in the low-frequency band of more than 10 days, while transient clusters were found in the band-pass frequency window between 2.5 and 6 days. In the low-frequency band, three pairs of clusters determined EOFs 1, 2, and 3, respectively; they exhibited well-known regional features, such as blocking, the Pacific/North American pattern, and wave trains. Both model and low-pass data exhibited strong bimodality.
Single-cluster dynamics for the random-cluster model
Deng, Y.; Qian, X.; Blöte, H.W.J.
2009-01-01
We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q-state Potts model to noninteger values q>1. Its results for static quantities are in a satisfactory agreement with those
Single-cluster dynamics for the random-cluster model
Deng, Y.; Qian, X.; Blöte, H.W.J.
2009-01-01
We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q-state Potts model to noninteger values q>1. Its results for static quantities are in a satisfactory agreement with those
Hasler, Nicole; Bulbul, Esra; Bonamente, Massimiliano; Carlstrom, John E.; Culverhouse, Thomas L.; Gralla, Megan; Greer, Christopher; Lamb, James W.; Hawkins, David; Hennessy, Ryan;
2012-01-01
We perform a joint analysis of X-ray and Sunyaev-Zel'dovich effect data using an analytic model that describes the gas properties of galaxy clusters. The joint analysis allows the measurement of the cluster gas mass fraction profile and Hubble constant independent of cosmological parameters. Weak cosmological priors are used to calculate the overdensity radius within which the gas mass fractions are reported. Such an analysis can provide direct constraints on the evolution of the cluster gas mass fraction with redshift. We validate the model and the joint analysis on high signal-to-noise data from the Chandra X-ray Observatory and the Sunyaev-Zel'dovich Array for two clusters, A2631 and A2204.
Modelling the Milky Way's globular cluster system
Binney, James; Wong, Leong Khim
2017-05-01
We construct a model for the Galactic globular cluster system based on a realistic gravitational potential and a distribution function (DF) analytic in the action integrals. The DF comprises disc and halo components whose functional forms resemble those recently used to describe the stellar discs and stellar halo. We determine the posterior distribution of our model parameters using a Bayesian approach. This gives us an understanding of how well the globular cluster data constrain our model. The favoured parameter values of the disc and halo DFs are similar to values previously obtained from fits to the stellar disc and halo, although the cluster halo system shows clearer rotation than does the stellar halo. Our model reproduces the generic features of the globular cluster system, namely the density profile, the mean rotation velocity and the fraction of metal-rich clusters. However, the data indicate either incompatibility between catalogued cluster distances and current estimates of distance to the Galactic Centre, or failure to identify clusters behind the bulge. As the data for our Galaxy's components increase in volume and precision over the next few years, it will be rewarding to revisit the present analysis.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.
Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K
2013-03-01
Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
Angular momentum in cluster Spherical Collapse Model
Cupani, Guido; Mardirossian, Fabio
2011-01-01
Our new formulation of the Spherical Collapse Model (SCM-L) takes into account the presence of angular momentum associated with the motion of galaxy groups infalling towards the centre of galaxy clusters. The angular momentum is responsible for an additional term in the dynamical equation which is useful to describe the evolution of the clusters in the non-equilibrium region which is investigated in the present paper. Our SCM-L can be used to predict the profiles of several strategic dynamical quantities as the radial and tangential velocities of member galaxies, and the total cluster mass. A good understanding of the non-equilibrium region is important since it is the natural scenario where to study the infall in galaxy clusters and the accretion phenomena present in these objects. Our results corroborate previous estimates and are in very good agreement with the analysis of recent observations and of simulated clusters.
Filtering Genes for Cluster and Network Analysis
Directory of Open Access Journals (Sweden)
Parkhomenko Elena
2009-06-01
Full Text Available Abstract Background Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias. Results This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks. Conclusion The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.
Cluster Analysis and Clinical Asthma Phenotypes
Shaw, Dominic E.; Berry, Michael A.; Thomas, Michael; Brightling, Christopher E.; Wardlaw, Andrew J.
2014-01-01
Rationale Heterogeneity in asthma expression is multidimensional, including variability in clinical, physiologic, and pathologic parameters. Classification requires consideration of these disparate domains in a unified model. Objectives To explore the application of a multivariate mathematical technique, k-means cluster analysis, for identifying distinct phenotypic groups. Methods We performed k-means cluster analysis in three independent asthma populations. Clusters of a population managed in primary care (n = 184) with predominantly mild to moderate disease, were compared with a refractory asthma population managed in secondary care (n = 187). We then compared differences in asthma outcomes (exacerbation frequency and change in corticosteroid dose at 12 mo) between clusters in a third population of 68 subjects with predominantly refractory asthma, clustered at entry into a randomized trial comparing a strategy of minimizing eosinophilic inflammation (inflammation-guided strategy) with standard care. Measurements and Main Results Two clusters (early-onset atopic and obese, noneosinophilic) were common to both asthma populations. Two clusters characterized by marked discordance between symptom expression and eosinophilic airway inflammation (early-onset symptom predominant and late-onset inflammation predominant) were specific to refractory asthma. Inflammation-guided management was superior for both discordant subgroups leading to a reduction in exacerbation frequency in the inflammation-predominant cluster (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) and a dose reduction of inhaled corticosteroid in the symptom-predominant cluster (mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–3,349 μg]; P = 0.02). Conclusions Cluster analysis offers a novel multidimensional approach for identifying asthma phenotypes that exhibit differences in clinical response to treatment algorithms. PMID:18480428
Scoring methods used in cluster analysis
Sirota, Sergej
2014-01-01
The aim of the thesis is to compare methods of cluster analysis correctly classify objects in the dataset into groups, which are known. In the theoretical section first describes the steps needed to prepare a data file for cluster analysis. The next theoretical section is dedicated to the cluster analysis, which describes ways of measuring similarity of objects and clusters, and dedicated to description the methods of cluster analysis used in practical part of this thesis. In practical part a...
Nonlinear analysis of EAS clusters
Zotov, M Yu; Fomin, Y A; Fomin, Yu. A.
2002-01-01
We apply certain methods of nonlinear time series analysis to the extensive air shower clusters found earlier in the data set obtained with the EAS-1000 Prototype array. In particular, we use the Grassberger-Procaccia algorithm to compute the correlation dimension of samples in the vicinity of the clusters. The validity of the results is checked by surrogate data tests and some additional quantities. We compare our conclusions with the results of similar investigations performed by the EAS-TOP and LAAS groups.
The Baltimore and Utrecht models for cluster dissolution
Lamers, H.J.G.L.M.
2009-01-01
The analysis of the age distributions of star cluster samples of different galaxies has resulted in two very different empirical models for the dissolution of star clusters: the Baltimore model and the Utrecht model. I describe these two models and their differences. The Baltimore model implies that
Directory of Open Access Journals (Sweden)
Marzanna Witek-Hajduk
2017-06-01
Full Text Available The aim of this study is to examine if, among consumer durable goods’ manufacturers operating in Poland, clusters could be distinguished in terms of the strength of benefits obtained from their cooperation with the key retailer. Also, this article aims to verify if these clusters could be differentiated according to the business models employed by the two parties. With the CATI method data was collected from 613 respondents that were clustered into 5 groups. The established clusters proved to differ statistically in terms of the manufacturer’s business model. From the perspective of the manufacturer, however, these differences proved to be poor predictors of the overall level of the obtained benefits.
Supermodel Analysis of Galaxy Clusters
Fusco-Femiano, R; Lapi, A
2009-01-01
[abridged] We present the analysis of the X-ray brightness and temperature profiles for six clusters belonging to both the Cool Core and Non Cool Core classes, in terms of the Supermodel (SM) developed by Cavaliere, Lapi & Fusco-Femiano (2009). Based on the gravitational wells set by the dark matter halos, the SM straightforwardly expresses the equilibrium of the IntraCluster Plasma (ICP) modulated by the entropy deposited at the boundary by standing shocks from gravitational accretion, and injected at the center by outgoing blastwaves from mergers or from outbursts of Active Galactic Nuclei. The cluster set analyzed here highlights not only how simply the SM represents the main dichotomy Cool vs. Non Cool Core clusters in terms of a few ICP parameters governing the radial entropy run, but also how accurately it fits even complex brightness and temperature profiles. For Cool Core clusters like A2199 and A2597, the SM with a low level of central entropy straightforwardly yields the characteristic peaked pr...
Clustering analysis of seismicity and aftershock identification.
Zaliapin, Ilya; Gabrielov, Andrei; Keilis-Borok, Vladimir; Wong, Henry
2008-07-01
We introduce a statistical methodology for clustering analysis of seismicity in the time-space-energy domain and use it to establish the existence of two statistically distinct populations of earthquakes: clustered and nonclustered. This result can be used, in particular, for nonparametric aftershock identification. The proposed approach expands the analysis of Baiesi and Paczuski [Phys. Rev. E 69, 066106 (2004)10.1103/PhysRevE.69.066106] based on the space-time-magnitude nearest-neighbor distance eta between earthquakes. We show that for a homogeneous Poisson marked point field with exponential marks, the distance eta has the Weibull distribution, which bridges our results with classical correlation analysis for point fields. The joint 2D distribution of spatial and temporal components of eta is used to identify the clustered part of a point field. The proposed technique is applied to several seismicity models and to the observed seismicity of southern California.
CNEM: Cluster Based Network Evolution Model
Directory of Open Access Journals (Sweden)
Sarwat Nizamani
2015-01-01
Full Text Available This paper presents a network evolution model, which is based on the clustering approach. The proposed approach depicts the network evolution, which demonstrates the network formation from individual nodes to fully evolved network. An agglomerative hierarchical clustering method is applied for the evolution of network. In the paper, we present three case studies which show the evolution of the networks from the scratch. These case studies include: terrorist network of 9/11 incidents, terrorist network of WMD (Weapons Mass Destruction plot against France and a network of tweets discussing a topic. The network of 9/11 is also used for evaluation, using other social network analysis methods which show that the clusters created using the proposed model of network evolution are of good quality, thus the proposed method can be used by law enforcement agencies in order to further investigate the criminal networks
Jang, Jinwoo; Smyth, Andrew W.
2017-01-01
The objective of structural model updating is to reduce inherent modeling errors in Finite Element (FE) models due to simplifications, idealized connections, and uncertainties of material properties. Updated FE models, which have less discrepancies with real structures, give more precise predictions of dynamic behaviors for future analyses. However, model updating becomes more difficult when applied to civil structures with a large number of structural components and complicated connections. In this paper, a full-scale FE model of a major long-span bridge has been updated for improved consistency with real measured data. Two methods are applied to improve the model updating process. The first method focuses on improving the agreement of the updated mode shapes with the measured data. A nonlinear inequality constraint equation is used to an optimization procedure, providing the capability to regulate updated mode shapes to remain within reasonable agreements with those observed. An interior point algorithm deals with nonlinearity in the objective function and constraints. The second method finds very efficient updating parameters in a more systematic way. The selection of updating parameters in FE models is essential to have a successful updating result because the parameters are directly related to the modal properties of dynamic systems. An in-depth sensitivity analysis is carried out in an effort to precisely understand the effects of physical parameters in the FE model on natural frequencies. Based on the sensitivity analysis, cluster analysis is conducted to find a very efficient set of updating parameters.
Cluster models and other topics
Akaishi, Yoshinori; Horiuchi, Hisashi; Ikeda, Kiyomi
1986-01-01
This volume consists of contributions from some of Japan's most eminent nuclear theorists. The cluster model of the nucleus is discussed pedagogically and the current status of the field is surveyed. A contribution on Monte Carlo Methods and Lattice Gauge Theories gives nuclear theorists a glimpse of related developments in QCD and Gauge Theories. Few Body Systems are reviewed by Y Akaishi, paying special attention to the ATMS Multiple Scattering Method.
SPATIO-TEMPORAL CLUSTER ANALYSIS OF DISEASE
Directory of Open Access Journals (Sweden)
M. S. Abramovich
2014-01-01
Full Text Available The robust version of the spatial scanning statistics for clustering is proposed. Spatio-temporal cluster analysis algorithms were used for the cluster detection of incidence of thyroid carcinoma. Me-thods and algorithms of detection and building clusters for disease on studying territories are consi-dered.
基于聚类分析的元搜索引擎模型%Research on Meta-Search Engine Model based on Cluster Analysis
Institute of Scientific and Technical Information of China (English)
余肖生; 司新霞
2011-01-01
从聚类分析的概念着手,建立基于聚类分析的元搜索引擎模型,通过实例(Clusty.com)说明此模型是可行的,且对元搜索引擎的结果进行聚类有助于用户更准确地找到自己所需要的信息,节省用户的时间.%In this paper, the concept of cluster analysis was discussed. Meta-search engine model based on cluster analysis was established. And the model proved to be feasible by the examples ( Clusty. Com). Clustering on the results from meta search engine would help users to find more accurate information and which can save the user time.
On the Modeling and Analysis of Heterogeneous Radio Access Networks using a Poisson Cluster Process
DEFF Research Database (Denmark)
Suryaprakash, Vinay; Møller, Jesper; Fettweis, Gerhard P.
Future mobile networks are visualized as networks that consist of more than one type of base station to cope with rising user demands. Such networks are referred to as heterogeneous networks. There have been various attempts at modeling and optimization of such networks using spatial point proces...
A PAC-Bayesian Analysis of Graph Clustering and Pairwise Clustering
Seldin, Yevgeny
2010-01-01
We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering...
Single-cluster dynamics for the random-cluster model
Deng, Youjin; Qian, Xiaofeng; Blöte, Henk W. J.
2009-09-01
We formulate a single-cluster Monte Carlo algorithm for the simulation of the random-cluster model. This algorithm is a generalization of the Wolff single-cluster method for the q -state Potts model to noninteger values q>1 . Its results for static quantities are in a satisfactory agreement with those of the existing Swendsen-Wang-Chayes-Machta (SWCM) algorithm, which involves a full-cluster decomposition of random-cluster configurations. We explore the critical dynamics of this algorithm for several two-dimensional Potts and random-cluster models. For integer q , the single-cluster algorithm can be reduced to the Wolff algorithm, for which case we find that the autocorrelation functions decay almost purely exponentially, with dynamic exponents zexp=0.07 (1), 0.521 (7), and 1.007 (9) for q=2 , 3, and 4, respectively. For noninteger q , the dynamical behavior of the single-cluster algorithm appears to be very dissimilar to that of the SWCM algorithm. For large critical systems, the autocorrelation function displays a range of power-law behavior as a function of time. The dynamic exponents are relatively large. We provide an explanation for this peculiar dynamic behavior.
Analysis of the crystal lattice instability for cage–cluster systems using the superatom model
Energy Technology Data Exchange (ETDEWEB)
Serebrennikov, D. A., E-mail: dserebrennikov@innopark.kantiana.ru, E-mail: dimafania@mail.ru; Clementyev, E. S. [I. Kant Baltic Federal University, “Functional Nanomaterials” Scientific–Educational Center (Russian Federation); Alekseev, P. A. [“Kurchatov Institute” National Research Center (Russian Federation)
2016-09-15
We have investigated the lattice dynamics for a number of rare-earth hexaborides based on the superatom model within which the boron octahedron is substituted by one superatom with a mass equal to the mass of six boron atoms. Phenomenological models have been constructed for the acoustic and lowenergy optical phonon modes in RB{sub 6} (R = La, Gd, Tb, Dy) compounds. Using DyB{sub 6} as an example, we have studied the anomalous softening of longitudinal acoustic phonons in several crystallographic directions, an effect that is also typical of GdB{sub 6} and TbB{sub 6}. The softening of the acoustic branches is shown to be achieved through the introduction of negative interatomic force constants between rare-earth ions. We discuss the structural instability of hexaborides based on 4f elements, the role of valence instability in the lattice dynamics, and the influence of the number of f electrons on the degree of softening of phonon modes.
Cluster Analysis of Ranunculus Species
Directory of Open Access Journals (Sweden)
SURANTO
2002-01-01
Full Text Available The aim of the experiment was to examine whether the morphological characters of eleven species of Ranunculus collected from a number of populations were in agreement with the genetic data (isozyme. The method used in this study was polyacrilamide gel electrophoresis using peroxides, estarase, malate dehydrogenase, and acid phosphatase enzymes. The results showed that cluster analysis based on isozyme data have given a good support to classification of eleven species based on morphological groups. This study concluded that in certain species each morphological variation was profit to be genetically based.
Fuzzy Clustering Using the Convex Hull as Geometrical Model
Directory of Open Access Journals (Sweden)
Luca Liparulo
2015-01-01
Full Text Available A new approach to fuzzy clustering is proposed in this paper. It aims to relax some constraints imposed by known algorithms using a generalized geometrical model for clusters that is based on the convex hull computation. A method is also proposed in order to determine suitable membership functions and hence to represent fuzzy clusters based on the adopted geometrical model. The convex hull is not only used at the end of clustering analysis for the geometric data interpretation but also used during the fuzzy data partitioning within an online sequential procedure in order to calculate the membership function. Consequently, a pure fuzzy clustering algorithm is obtained where clusters are fitted to the data distribution by means of the fuzzy membership of patterns to each cluster. The numerical results reported in the paper show the validity and the efficacy of the proposed approach with respect to other well-known clustering algorithms.
Co-clustering models, algorithms and applications
Govaert, Gérard
2013-01-01
Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture
Clusters of classical water models
Kiss, Péter T.; Baranyai, András
2009-11-01
The properties of clusters can be used as tests of models constructed for molecular simulation of water. We searched for configurations with minimal energies for a small number of molecules. We identified topologically different structures close to the absolute energy minimum of the system by calculating overlap integrals and enumerating hydrogen bonds. Starting from the dimer, we found increasing number of topologically different, low-energy arrangements for the trimer(3), the tetramer(6), the pentamer(6), and the hexamer(9). We studied simple models with polarizable point dipole. These were the BSV model [J. Brodholt et al., Mol. Phys. 86, 149 (1995)], the DC model [L. X. Dang and T. M. Chang, J. Chem. Phys. 106, 8149 (1997)], and the GCP model [P. Paricaud et al., J. Chem. Phys. 122, 244511 (2005)]. As an alternative the SWM4-DP and the SWM4-NDP charge-on-spring models [G. Lamoureux et al., Chem. Phys. Lett. 418, 245 (2006)] were also investigated. To study the impact of polarizability restricted to the plane of the molecule we carried out calculations for the SPC-FQ and TIP4P-FQ models, too [S. W. Rick et al., J. Chem. Phys. 101, 6141 (1994)]. In addition to them, justified by their widespread use even for near critical or surface behavior calculations, we identified clusters for five nonpolarizable models of ambient water, SPC/E [H. J. C. Berendsen et al., J. Phys. Chem. 91, 6269 (1987)], TIP4P [W. L. Jorgensen et al., J. Chem. Phys. 79, 926 (1983)], TIP4P-EW [H. W. Horn et al., J. Chem. Phys. 120, 9665 (2004)], and TIP4P/2005 [J. L. F. Abascal and C. Vega, J. Chem. Phys. 123, 234505 (2005)]. The fifth was a five-site model named TIP5P [M. W. Mahoney and W. L. Jorgensen, J. Chem. Phys. 112, 8910 (2000)]. To see the impact of the vibrations we studied the flexible SPC model. [K. Toukan and A. Rahman, Phys. Rev. B 31, 2643 (1985)]. We evaluated the results comparing them with experimental data and quantum chemical calculations. The position of the negative
AMOEBA clustering revisited. [cluster analysis, classification, and image display program
Bryant, Jack
1990-01-01
A description of the clustering, classification, and image display program AMOEBA is presented. Using a difficult high resolution aircraft-acquired MSS image, the steps the program takes in forming clusters are traced. A number of new features are described here for the first time. Usage of the program is discussed. The theoretical foundation (the underlying mathematical model) is briefly presented. The program can handle images of any size and dimensionality.
Fuzzy Clustering Methods and their Application to Fuzzy Modeling
DEFF Research Database (Denmark)
Kroszynski, Uri; Zhou, Jianjun
1999-01-01
Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate...... prediction of outputs. This article presents an overview of some of the most popular clustering methods, namely Fuzzy Cluster-Means (FCM) and its generalizations to Fuzzy C-Lines and Elliptotypes. The algorithms for computing cluster centers and principal directions from a training data-set are described....... A method to obtain an optimized number of clusters is outlined. Based upon the cluster's characteristics, a behavioural model is formulated in terms of a rule-base and an inference engine. The article reviews several variants for the model formulation. Some limitations of the methods are listed...
Melton, Joe R.; Sospedra-Alfonso, Reinel; McCusker, Kelly E.
2017-07-01
We investigate the application of clustering algorithms to represent sub-grid scale variability in soil texture for use in a global-scale terrestrial ecosystem model. Our model, the coupled Canadian Land Surface Scheme - Canadian Terrestrial Ecosystem Model (CLASS-CTEM), is typically implemented at a coarse spatial resolution (approximately 2. 8° × 2. 8°) due to its use as the land surface component of the Canadian Earth System Model (CanESM). CLASS-CTEM can, however, be run with tiling of the land surface as a means to represent sub-grid heterogeneity. We first determined that the model was sensitive to tiling of the soil textures via an idealized test case before attempting to cluster soil textures globally. To cluster a high-resolution soil texture dataset onto our coarse model grid, we use two linked algorithms - the Ordering Points to Identify the Clustering Structure (OPTICS) algorithm (Ankerst et al., 1999; Daszykowski et al., 2002) and the algorithm of Sander et al. (2003) - to provide tiles of representative soil textures for use as CLASS-CTEM inputs. The clustering process results in, on average, about three tiles per CLASS-CTEM grid cell with most cells having four or less tiles. Results from CLASS-CTEM simulations conducted with the tiled inputs (Cluster) versus those using a simple grid-mean soil texture (Gridmean) show CLASS-CTEM, at least on a global scale, is relatively insensitive to the tiled soil textures; however, differences can be large in arid or peatland regions. The Cluster simulation has generally lower soil moisture and lower overall vegetation productivity than the Gridmean simulation except in arid regions where plant productivity increases. In these dry regions, the influence of the tiling is stronger due to the general state of vegetation moisture stress which allows a single tile, whose soil texture retains more plant-available water, to yield much higher productivity. Although the use of clustering analysis appears promising as a
Cluster analysis in phenotyping a Portuguese population.
Loureiro, C C; Sa-Couto, P; Todo-Bom, A; Bousquet, J
2015-09-03
Unbiased cluster analysis using clinical parameters has identified asthma phenotypes. Adding inflammatory biomarkers to this analysis provided a better insight into the disease mechanisms. This approach has not yet been applied to asthmatic Portuguese patients. To identify phenotypes of asthma using cluster analysis in a Portuguese asthmatic population treated in secondary medical care. Consecutive patients with asthma were recruited from the outpatient clinic. Patients were optimally treated according to GINA guidelines and enrolled in the study. Procedures were performed according to a standard evaluation of asthma. Phenotypes were identified by cluster analysis using Ward's clustering method. Of the 72 patients enrolled, 57 had full data and were included for cluster analysis. Distribution was set in 5 clusters described as follows: cluster (C) 1, early onset mild allergic asthma; C2, moderate allergic asthma, with long evolution, female prevalence and mixed inflammation; C3, allergic brittle asthma in young females with early disease onset and no evidence of inflammation; C4, severe asthma in obese females with late disease onset, highly symptomatic despite low Th2 inflammation; C5, severe asthma with chronic airflow obstruction, late disease onset and eosinophilic inflammation. In our study population, the identified clusters were mainly coincident with other larger-scale cluster analysis. Variables such as age at disease onset, obesity, lung function, FeNO (Th2 biomarker) and disease severity were important for cluster distinction. Copyright © 2015. Published by Elsevier España, S.L.U.
The applicability and effectiveness of cluster analysis
Ingram, D. S.; Actkinson, A. L.
1973-01-01
An insight into the characteristics which determine the performance of a clustering algorithm is presented. In order for the techniques which are examined to accurately cluster data, two conditions must be simultaneously satisfied. First the data must have a particular structure, and second the parameters chosen for the clustering algorithm must be correct. By examining the structure of the data from the Cl flight line, it is clear that no single set of parameters can be used to accurately cluster all the different crops. The effectiveness of either a noniterative or iterative clustering algorithm to accurately cluster data representative of the Cl flight line is questionable. Thus extensive a prior knowledge is required in order to use cluster analysis in its present form for applications like assisting in the definition of field boundaries and evaluating the homogeneity of a field. New or modified techniques are necessary for clustering to be a reliable tool.
基于混合概率潜在语义分析模型的Web聚类%Web clustering based on hybrid probabilistic latent semantic analysis model
Institute of Scientific and Technical Information of China (English)
王治和; 王凌云; 党辉; 潘丽娜
2012-01-01
In E-commerce, in order to know more about the inherent characteristics of user access and make better marketing strategies, a Web clustering algorithm based on Hybrid Probabilistic Latent Semantic Analysis (H-PLSA) model was proposed in this paper. The Probabilistic Latent Semantic Analysis ( PLSA) models were established respectively on user browsing data, page information and enhanced user transaction data by using PLSA technology. Using log-likelihood function, three PLSA models were merged to get the user clustering H-PLSA model and the page clustering H-PLSA model. Similarity calculation was based on the conditional probability among latent themes and user, page as well as site in the clustering analysis. The k-medoids algorithm based on distance was adopted in this clustering algorithm. The H-PLSA model was designed and constructed in this article, and the Web clustering algorithm was verified on this H-PLSA model. Then it is proved that the algorithm is effective.%在电子商务应用中,为了更好地了解用户的内在特征,制定有效的营销策略,提出一种基于混合概率潜在语义分析(H-PLSA)模型的Web聚类算法.利用概率潜在语义分析-(PLSA)技术分别对用户浏览数据、页面内容信息及内容增强型用户事务数据建立PLSA模型,通过对数一似然函数对三个PLSA模型进行合并得到用户聚类的H-PLSA模型和页面聚类的H-PLSA模型.聚类分析中以潜在主题与用户、页面以及站点之间的条件概率作为相似度计算依据,聚类算法采用基于距离的k-medoids算法.设计并构建了H-PLSA模型,在该模型上对Web聚类算法进行验证,表明该算法是可行的.
The Baltimore and Utrecht models for cluster dissolution
Lamers, Henny J G L M
2008-01-01
The analysis of the age distributions of star cluster samples of different galaxies has resulted in two very different empirical models for the dissolution of star clusters: the Baltimore model and the Utrecht model. I describe these two models and their differences. The Baltimore model implies that the dissolution of star clusters is mass independent and that about 90% of the clusters are destroyed each age dex, up to an age of about a Gyr, after which point mass-dependent dissolution from two-body relaxation becomes the dominant mechanism. In the Utrecht model, cluster dissolution occurs in three stages: (i) mass-independent infant mortality due to the expulsion of gas up to about 10 Myr; (ii) a phase of slow dynamical evolution with strong evolutionary fading of the clusters lasting up to about a Gyr; and (iii) a phase dominated by mass dependent-dissolution, as predicted by dynamical models. I describe the cluster age distributions for mass-limited and magnitude-limited cluster samples for both models. I ...
3D simulation of the Cluster-Cluster Aggregation model
Li, Chao; Xiong, Hailing
2014-12-01
We write a program to implement the Cluster-Cluster Aggregation (CCA) model with java programming language. By using the simulation program, the fractal aggregation growth process can be displayed dynamically in the form of a three-dimensional (3D) figure. Meanwhile, the related kinetics data of aggregation simulation can be also recorded dynamically. Compared to the traditional programs, the program has better real-time performance and is more helpful to observe the fractal growth process, which contributes to the scientific study in fractal aggregation. Besides, because of adopting java programming language, the program has very good cross-platform performance.
Directory of Open Access Journals (Sweden)
A. V. Brykin
2013-01-01
Full Text Available The cluster principle development in the world of electronics is one of the most effective examples of high-tech industry. The author considers the possibility of using clusters to modernize the Russian economy.
Nisius, Britta; Göller, Andreas H; Bajorath, Jürgen
2009-01-01
Blockade of the human ether-a-go-go related gene potassium channel is regarded as a major cause of drug toxicity and associated with severe cardiac side-effects. A variety of in silico models have been reported to aid in the identification of compounds blocking the human ether-a-go-go related gene channel. Herein, we present a classification approach for the detection of diverse human ether-a-go-go related gene blockers that combines cluster analysis of training data, feature selection and support vector machine learning. Compound learning sets are first divided into clusters of similar molecules. For each cluster, independent support vector machine models are generated utilizing preselected MACCS structural keys as descriptors. These models are combined to predict human ether-a-go-go related gene inhibition of our large compound data set with consistent experimental measurements (i.e. only patch clamp measurements on mammalian cell lines). Our combined support vector machine model achieves a prediction accuracy of 85% on this data set and performs better than alternative methods used for comparison. We also find that structural keys selected on the basis of statistical criteria are associated with molecular substructures implicated in human ether-a-go-go related gene channel binding.
Fan, Zhou; Chen, Bingqiu; Jiang, Linhua; Bian, Fuyan; Li, Zhongmu
2016-01-01
Application of fitting techniques to obtain physical parameters---such as ages, metallicities, and $\\alpha$-element to iron ratios---of stellar populations is an important approach to understand the nature of both galaxies and globular clusters (GCs). In fact, fitting methods based on different underlying models may yield different results, and with varying precision. In this paper, we have selected 22 confirmed M31 GCs for which we do not have access to previously known spectroscopic metallicities. Most are located at approximately one degree (in projection) from the galactic center. We performed spectroscopic observations with the 6.5 m MMT telescope, equipped with its Red Channel Spectrograph. Lick/IDS absorption-line indices, radial velocities, ages, and metallicities were derived based on the $\\rm EZ\\_Ages$ stellar population parameter calculator. We also applied full spectral fitting with the ULySS code to constrain the parameters of our sample star clusters. In addition, we performed $\\chi^2_{\\rm min}$...
Robust cluster analysis and variable selection
Ritter, Gunter
2014-01-01
Clustering remains a vibrant area of research in statistics. Although there are many books on this topic, there are relatively few that are well founded in the theoretical aspects. In Robust Cluster Analysis and Variable Selection, Gunter Ritter presents an overview of the theory and applications of probabilistic clustering and variable selection, synthesizing the key research results of the last 50 years. The author focuses on the robust clustering methods he found to be the most useful on simulated data and real-time applications. The book provides clear guidance for the varying needs of bot
ASteCA - Automated Stellar Cluster Analysis
Perren, Gabriel I; Piatti, Andrés E
2014-01-01
We present ASteCA (Automated Stellar Cluster Analysis), a suit of tools designed to fully automatize the standard tests applied on stellar clusters to determine their basic parameters. The set of functions included in the code make use of positional and photometric data to obtain precise and objective values for a given cluster's center coordinates, radius, luminosity function and integrated color magnitude, as well as characterizing through a statistical estimator its probability of being a true physical cluster rather than a random overdensity of field stars. ASteCA incorporates a Bayesian field star decontamination algorithm capable of assigning membership probabilities using photometric data alone. An isochrone fitting process based on the generation of synthetic clusters from theoretical isochrones and selection of the best fit through a genetic algorithm is also present, which allows ASteCA to provide accurate estimates for a cluster's metallicity, age, extinction and distance values along with its unce...
Cluster analysis for computer workload evaluation
Landau, K
1976-01-01
An introduction to computer workload analysis is given, showing its range of application in computer centre management, system and application programming. Cluster methods are discussed which can be used in conjunction with workload data and cluster algorithms are adapted to the specific set problem. Several samples of CDC 7600- accounting-data-collected at CERN, the European Organization for Nuclear Research-underwent a cluster analysis to determine job groups. The conclusions from resource usage of typical job groups in relation to computer workload analysis are discussed. (17 refs).
[Cluster analysis and its application].
Půlpán, Zdenĕk
2002-01-01
The study exploits knowledge-oriented and context-based modification of well-known algorithms of (fuzzy) clustering. The role of fuzzy sets is inherently inclined towards coping with linguistic domain knowledge also. We try hard to obtain from rich diverse data and knowledge new information about enviroment that is being explored.
Cluster Analysis of Adolescent Blogs
Liu, Eric Zhi-Feng; Lin, Chun-Hung; Chen, Feng-Yi; Peng, Ping-Chuan
2012-01-01
Emerging web applications and networking systems such as blogs have become popular, and they offer unique opportunities and environments for learners, especially for adolescent learners. This study attempts to explore the writing styles and genres used by adolescents in their blogs by employing content, factor, and cluster analyses. Factor…
Analysis of New Energy Industry Cluster Competitiveness Based on GEM Model%基于GEM模型新能源产业集群竞争力分析
Institute of Scientific and Technical Information of China (English)
代学钢
2013-01-01
The development of new energy industry has important significance in promoting the economic development in Hebei Province, the new energy industry cluster has begun to take shape in Hebei Province, enhance the competitiveness of industrial clusters has become a key factor to promote the rapid development of new energy industry in Hebei province. In this paper, taking the GEM model as the analysis framework, through the analysis of the related factors restricting Hebei Province, to find new energy enterprise sustainable development "bottleneck" problem, puts forward some countermeasures and suggestions for scientific and effective to enhance the competitiveness of new energy industry cluster in Hebei province.%新能源产业的发展对促进河北省经济发展具有重要意义，河北省新能源产业集群初具规模，提升产业集群竞争力成为推动河北省新能源产业快速发展的重要因素。文章以GEM模型为分析框架，通过对相关因素的分析，找到制约河北省新能源企业可持续发展的“瓶颈”问题，提出了科学有效提升河北省新能源产业集群竞争力的对策建议。
Clues on the Evolution of Cluster Galaxies From The Analysis of Their Orbital Anisotropies
Biviano, A.; Katgert, P.; Thomas, T; Mazure, A.
2003-01-01
We study the evolution of galaxies in clusters by the analysis of a sample of about 3000 galaxies, members of 59 clusters from the ESO Nearby Abell Cluster Survey (ENACS). We distinguish four cluster galaxy populations, based on their radial and velocity distributions within the clusters. Using the class of ellipticals and S0's (excluding the very bright ellipticals), we determine the average cluster mass profile, that we compare with mass models available from numerical simulations. We then ...
Cluster Analysis of the Malaysian Hipposideros
Sazali, Siti Nurlydia; Laman, Charlie J.; Abdullah, M. T.
2008-01-01
A preliminary study on the morphometric variations among species in the genus Hipposideros was conducted using voucher specimens from the Universiti Malaysia Sarawak (UNIMAS) Zoological Museum and the Department of Wildlife and National Park (DWNP) Kuala Lumpur. A total of 24 individuals from six species of this genus were morphologically studied where all related measurements of body, skull and dental were measured and recorded. The statistical data subjected to the cluster analysis shows that the genus Hipposideros is divided into two major clusters where each species was clearly separated. The cluster analysis among Hipposideros species is useful for aiding in species identification.
Using cluster analysis to explore survey data.
Spencer, Llinos; Roberts, Gwerfyl; Irvine, Fiona; Jones, Peter; Baker, Colin
2007-01-01
Llinos Haf Spencer reports on the use of the cluster analysis statistical technique in nursing research and uses data from the Welsh Language Awareness in Healthcare Provision in Wales survey as an exemplar She concludes that cluster analysis is a valuable tool to tease out patterns in data that are not initially evident in bivariate analyses and thus should be considered as a viable option for nursing research.
Nursing home care quality: a cluster analysis.
Grøndahl, Vigdis Abrahamsen; Fagerli, Liv Berit
2017-02-13
Purpose The purpose of this paper is to explore potential differences in how nursing home residents rate care quality and to explore cluster characteristics. Design/methodology/approach A cross-sectional design was used, with one questionnaire including questions from quality from patients' perspective and Big Five personality traits, together with questions related to socio-demographic aspects and health condition. Residents ( n=103) from four Norwegian nursing homes participated (74.1 per cent response rate). Hierarchical cluster analysis identified clusters with respect to care quality perceptions. χ(2) tests and one-way between-groups ANOVA were performed to characterise the clusters ( pclusters were identified; Cluster 1 residents (28.2 per cent) had the best care quality perceptions and Cluster 2 (67.0 per cent) had the worst perceptions. The clusters were statistically significant and characterised by personal-related conditions: gender, psychological well-being, preferences, admission, satisfaction with staying in the nursing home, emotional stability and agreeableness, and by external objective care conditions: healthcare personnel and registered nurses. Research limitations/implications Residents assessed as having no cognitive impairments were included, thus excluding the largest group. By choosing questionnaire design and structured interviews, the number able to participate may increase. Practical implications Findings may provide healthcare personnel and managers with increased knowledge on which to develop strategies to improve specific care quality perceptions. Originality/value Cluster analysis can be an effective tool for differentiating between nursing homes residents' care quality perceptions.
Cancer incidence in men: a cluster analysis of spatial patterns
Directory of Open Access Journals (Sweden)
D'Alò Daniela
2008-11-01
Full Text Available Abstract Background Spatial clustering of different diseases has received much less attention than single disease mapping. Besides chance or artifact, clustering of different cancers in a given area may depend on exposure to a shared risk factor or to multiple correlated factors (e.g. cigarette smoking and obesity in a deprived area. Models developed so far to investigate co-occurrence of diseases are not well-suited for analyzing many cancers simultaneously. In this paper we propose a simple two-step exploratory method for screening clusters of different cancers in a population. Methods Cancer incidence data were derived from the regional cancer registry of Umbria, Italy. A cluster analysis was performed on smoothed and non-smoothed standardized incidence ratios (SIRs of the 13 most frequent cancers in males. The Besag, York and Mollie model (BYM and Poisson kriging were used to produce smoothed SIRs. Results Cluster analysis on non-smoothed SIRs was poorly informative in terms of clustering of different cancers, as only larynx and oral cavity were grouped, and of characteristic patterns of cancer incidence in specific geographical areas. On the other hand BYM and Poisson kriging gave similar results, showing cancers of the oral cavity, larynx, esophagus, stomach and liver formed a main cluster. Lung and urinary bladder cancers clustered together but not with the cancers mentioned above. Both methods, particularly the BYM model, identified distinct geographic clusters of adjacent areas. Conclusion As in single disease mapping, non-smoothed SIRs do not provide reliable estimates of cancer risks because of small area variability. The BYM model produces smooth risk surfaces which, when entered into a cluster analysis, identify well-defined geographical clusters of adjacent areas. It probably enhances or amplifies the signal arising from exposure of more areas (statistical units to shared risk factors that are associated with different cancers. In
Performance Analysis of Hierarchical Clustering Algorithm
Directory of Open Access Journals (Sweden)
K.Ranjini
2011-07-01
Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.
Clustering analysis of telecommunication customers
Institute of Scientific and Technical Information of China (English)
REN Hong; ZHENG Yan; WU Ye-rong
2009-01-01
In this article, a clustering method based on genetic algorithm (GA) for telecommunication customer subdivision is presented. First, the features of telecommunication customers (such as the calling behavior and consuming behavior) are extracted. Second, the similarities between the multidimensional feature vectors of telecommunication customers are computed and mapped as the distance between samples on a two-dimensional plane. Finally, the distances are adjusted to approximate the similarities gradually by GA. One advantage of this method is the independent distribution of the sample space. The experiments demonstrate the feasibility of the proposed method.
Cluster analysis of word frequency dynamics
Maslennikova, Yu S.; Bochkarev, V. V.; Belashova, I. A.
2015-01-01
This paper describes the analysis and modelling of word usage frequency time series. During one of previous studies, an assumption was put forward that all word usage frequencies have uniform dynamics approaching the shape of a Gaussian function. This assumption can be checked using the frequency dictionaries of the Google Books Ngram database. This database includes 5.2 million books published between 1500 and 2008. The corpus contains over 500 billion words in American English, British English, French, German, Spanish, Russian, Hebrew, and Chinese. We clustered time series of word usage frequencies using a Kohonen neural network. The similarity between input vectors was estimated using several algorithms. As a result of the neural network training procedure, more than ten different forms of time series were found. They describe the dynamics of word usage frequencies from birth to death of individual words. Different groups of word forms were found to have different dynamics of word usage frequency variations.
Topics in modelling of clustered data
Aerts, Marc; Ryan, Louise M; Geys, Helena
2002-01-01
Many methods for analyzing clustered data exist, all with advantages and limitations in particular applications. Compiled from the contributions of leading specialists in the field, Topics in Modelling of Clustered Data describes the tools and techniques for modelling the clustered data often encountered in medical, biological, environmental, and social science studies. It focuses on providing a comprehensive treatment of marginal, conditional, and random effects models using, among others, likelihood, pseudo-likelihood, and generalized estimating equations methods. The authors motivate and illustrate all aspects of these models in a variety of real applications. They discuss several variations and extensions, including individual-level covariates and combined continuous and discrete outcomes. Flexible modelling with fractional and local polynomials, omnibus lack-of-fit tests, robustification against misspecification, exact, and bootstrap inferential procedures all receive extensive treatment. The application...
Rondeau, Virginie; Pignon, Jean-Pierre; Michiels, Stefan
2015-12-01
The observation of time to tumour progression (TTP) or progression-free survival (PFS) may be terminated by a terminal event. In this context, deaths may be due to tumour progression, and the time to the major failure event (death) may be correlated with the TTP. The usual assumption of independence between the TTP process and death, required by many commonly used statistical methods, can be violated. Furthermore, although the relationship between TTP and time to death is most relevant to the anti-cancer drug development or to evaluation of TTP as a surrogate endpoint, statistical models that try to describe the dependence structure between these two characteristics are not frequently used. We propose a joint frailty model for the analysis of two survival endpoints, TTP and time to death, or PFS and time to death, in the context of data clustering (e.g. at the centre or trial level). This approach allows us to simultaneously evaluate the prognostic effects of covariates on the two survival endpoints, while accounting both for the relationship between the outcomes and for data clustering. We show how a maximum penalized likelihood estimation can be applied to a nonparametric estimation of the continuous hazard functions in a general joint frailty model with right censoring and delayed entry. The model was motivated by a large meta-analysis of randomized trials for head and neck cancers (Meta-Analysis of Chemotherapy in Head and Neck Cancers), in which the efficacy of chemotherapy on TTP or PFS and overall survival was investigated, as adjunct to surgery or radiotherapy or both.
Modeling the Tenuous Intracluster Medium in Globular Clusters
Naiman, J; Ramirez-Ruiz, E
2013-01-01
We employ hydrodynamical simulations to investigate the underlying mechanism responsible for the low levels of gas and dust in globular clusters. Our models examine the competing effects of mass supply from the evolved stellar population and energy injection from the main sequence stellar members for globular clusters 47 Tucanae, M15, NGC 6440, and NGC 6752. Disregarding all other gas evacuation processes, we find that the energy output from the main sequence stellar population alone is capable of effectively clearing the evolved stellar ejecta and producing intracluster gas densities consistent with current observational constraints. This result distinguishes a viable ubiquitous gas and dust evacuation mechanism for globular clusters. In addition, we extend our analysis to probe the efficiency of pulsar wind feedback in globular clusters. The detection of intracluster ionized gas in cluster 47 Tucanae allows us to place particularly strict limits on pulsar wind thermalization efficiency, which must be extrem...
Assessment of cluster yield components by image analysis.
Diago, Maria P; Tardaguila, Javier; Aleixos, Nuria; Millan, Borja; Prats-Montalban, Jose M; Cubero, Sergio; Blasco, Jose
2015-04-01
Berry weight, berry number and cluster weight are key parameters for yield estimation for wine and tablegrape industry. Current yield prediction methods are destructive, labour-demanding and time-consuming. In this work, a new methodology, based on image analysis was developed to determine cluster yield components in a fast and inexpensive way. Clusters of seven different red varieties of grapevine (Vitis vinifera L.) were photographed under laboratory conditions and their cluster yield components manually determined after image acquisition. Two algorithms based on the Canny and the logarithmic image processing approaches were tested to find the contours of the berries in the images prior to berry detection performed by means of the Hough Transform. Results were obtained in two ways: by analysing either a single image of the cluster or using four images per cluster from different orientations. The best results (R(2) between 69% and 95% in berry detection and between 65% and 97% in cluster weight estimation) were achieved using four images and the Canny algorithm. The model's capability based on image analysis to predict berry weight was 84%. The new and low-cost methodology presented here enabled the assessment of cluster yield components, saving time and providing inexpensive information in comparison with current manual methods. © 2014 Society of Chemical Industry.
Principal Component Clustering Approach to Teaching Quality Discriminant Analysis
Xian, Sidong; Xia, Haibo; Yin, Yubo; Zhai, Zhansheng; Shang, Yan
2016-01-01
Teaching quality is the lifeline of the higher education. Many universities have made some effective achievement about evaluating the teaching quality. In this paper, we establish the Students' evaluation of teaching (SET) discriminant analysis model and algorithm based on principal component clustering analysis. Additionally, we classify the SET…
Enhancing Digital Book Clustering by LDAC Model
Wang, Lidong; Jie, Yuan
In Digital Library (DL) applications, digital book clustering is an important and urgent research task. However, it is difficult to conduct effectively because of the great length of digital books. To do the correct clustering for digital books, a novel method based on probabilistic topic model is proposed. Firstly, we build a topic model named LDAC. The main goal of LDAC topic modeling is to effectively extract topics from digital books. Subsequently, Gibbs sampling is applied for parameter inference. Once the model parameters are learned, each book is assigned to the cluster which maximizes the posterior probability. Experimental results demonstrate that our approach based on LDAC is able to achieve significant improvement as compared to the related methods.
Using Cluster Analysis to Examine Husband-Wife Decision Making
Bonds-Raacke, Jennifer M.
2006-01-01
Cluster analysis has a rich history in many disciplines and although cluster analysis has been used in clinical psychology to identify types of disorders, its use in other areas of psychology has been less popular. The purpose of the current experiments was to use cluster analysis to investigate husband-wife decision making. Cluster analysis was…
Bridges in the random-cluster model
Directory of Open Access Journals (Sweden)
Eren Metin Elçi
2016-02-01
Full Text Available The random-cluster model, a correlated bond percolation model, unifies a range of important models of statistical mechanics in one description, including independent bond percolation, the Potts model and uniform spanning trees. By introducing a classification of edges based on their relevance to the connectivity we study the stability of clusters in this model. We prove several exact relations for general graphs that allow us to derive unambiguously the finite-size scaling behavior of the density of bridges and non-bridges. For percolation, we are also able to characterize the point for which clusters become maximally fragile and show that it is connected to the concept of the bridge load. Combining our exact treatment with further results from conformal field theory, we uncover a surprising behavior of the (normalized variance of the number of (non-bridges, showing that it diverges in two dimensions below the value 4cos2(π/3=0.2315891⋯ of the cluster coupling q. Finally, we show that a partial or complete pruning of bridges from clusters enables estimates of the backbone fractal dimension that are much less encumbered by finite-size corrections than more conventional approaches.
Modeling the Blue Stragglers in Globular Clusters
Chatterjee, Sourav
2012-10-01
Blue stragglers {BS} have been extensively observed in Galactic globular clusters {GGC}. primarily with HST. Many theoretical studies have identified BS formation channels and it is understood that dynamics in GCs modifies formation and distribution of the BSs. Despite the wealth of observational data, comprehensive theoretical models including all relevant physical processes in dynamically evolving GCs do not exist. Our dynamical cluster modeling code, developed over the past decade, includes all relevant physical processes in a GC including two-body relaxation, strong scattering, physical collisions, and stellar-evolution {single and binary}. We can model GCs with realistic N and provide star-by-star models for GCs directly comparable with the observed data. This proposed study will create realistic GC models with initial conditions from a grid spanning a large range in the multidimensional parameter space including cluster mass, binary fraction, concentration, and Galactic position. Our numerical models combined with observational constraints from existing HST data will for the first time provide explanations for the observed trends in the BS populations in GGCs, the dominant formation channel for these BSs, typical dynamical ages of the BSs, and find detailed dynamical histories of the BSs in GGCs. These models will yield valuable insight on the correlations between the BS properties and a number of cluster dynamical properties {central density, binary fraction, and binary orbital properties} which will potentially help constrain a GC's past evolutionary history. As a bonus a large set of realistic theoretical GC models will be constructed.
Clustering Effects Within the Dinuclear Model
Adamian, Gurgen; Antonenko, Nikolai; Scheid, Werner
The clustering of two nuclei in a nuclear system creates configurations denoted in literature as nuclear molecular structures. A nuclear molecule or a dinuclear system (DNS) as named by Volkov consists of two touching nuclei (clusters) which keep their individuality. Such a system has two main degrees of freedom of collective motions which govern its dynamics: (i) the relative motion between the clusters leading to molecular resonances in the internuclear potential and to the decay of the dinuclear system (separation of the clusters) which is called quasifission since no compound system like in fission is first formed. (ii) the transfer of nucleons or light constituents between the two clusters of the dinuclear system leading to a special dynamics of the mass and charge asymmetries between the clusters in fusion and fission reactions. In this article we discuss the essential aspects of the diabatic internuclear potential used by the di-nuclear system concept and present applications to nuclear structure and reactions. We show applications of the dinuclear model to superdeformed and hyperdeformed bands. An extended discussion is given to the problems of fusion dynamics in the production of superheavy nuclei, to the quasifission process and to multi-nucleon transfer between nuclei. Also the binary and ternary fission processes are discussed within the scission-point model and the dinuclear system concept.
Clustering of Galaxies in Brane World Models
Hameeda, Mir; Ali, Ahmed Farag
2015-01-01
In this paper, we analyze the clustering of galaxies using a modified Newtonian potential. This modification of the Newtonian potential occurs due to the existence of extra dimensions in brane world models. We will analyze a system of galaxies interacting with each other through this modified Newtonian potential. The partition function for this system of galaxies will be calculated, and this partition function will be used to calculate the free energy of this system of galaxies. The entropy and the chemical potential for this system will also be calculated. We will derive an explicit expression for the clustering parameter for this system. This parameter will determine the behavior of this system, and we will be able to express various thermodynamic quantities using this clustering parameter. Thus, we will be able to explicitly analyze the effect that modifying the Newtonian potential can have on the clustering of galaxies.
Clustering of galaxies in brane world models
Hameeda, Mir; Faizal, Mir; Ali, Ahmed Farag
2016-04-01
In this paper, we analyze the clustering of galaxies using a modified Newtonian potential. This modification of the Newtonian potential occurs due to the existence of extra dimensions in brane world models. We will analyze a system of galaxies interacting with each other through this modified Newtonian potential. The partition function for this system of galaxies will be calculated, and this partition function will be used to calculate the free energy of this system of galaxies. The entropy and the chemical potential for this system will also be calculated. We will derive explicit expression for the clustering parameter for this system. This parameter will determine the behavior of this system, and we will be able to express various thermodynamic quantities using this clustering parameter. Thus, we will be able to explicitly analyze the effect that modifying the Newtonian potential can have on the clustering of galaxies. We also analyse the effect of extra dimensions on the two-point functions between galaxies.
Toward optimal cluster power spectrum analysis
Smith, Robert E
2014-01-01
The power spectrum of galaxy clusters is an important probe of the cosmological model. In this paper we determine the optimal weighting scheme for maximizing the signal-to-noise ratio for such measurements. We find a closed form analytic expression for the optimal weights. Our expression takes into account: cluster mass, finite survey volume effects, survey masking, and a flux limit. The implementation of this weighting scheme requires knowledge of the measured cluster masses, and analytic models for the bias and space-density of clusters as a function of mass and redshift. Recent studies have suggested that the optimal method for reconstruction of the matter density field from a set of clusters is mass-weighting (Seljak et al 2009, Hamaus et al 2010, Cai et al 2011). We compare our optimal weighting scheme with this approach and also with the original power spectrum scheme of Feldman et al (1994). We show that our optimal weighting scheme outperforms these approaches for both volume- and flux-limited cluster...
Deke, John
2016-10-25
Cluster randomized controlled trials (CRCTs) often require a large number of clusters in order to detect small effects with high probability. However, there are contexts where it may be possible to design a CRCT with a much smaller number of clusters (10 or fewer) and still detect meaningful effects. The objective is to offer recommendations for best practices in design and analysis for small CRCTs. I use simulations to examine alternative design and analysis approaches. Specifically, I examine (1) which analytic approaches control Type I errors at the desired rate, (2) which design and analytic approaches yield the most power, (3) what is the design effect of spurious correlations, and (4) examples of specific scenarios under which impacts of different sizes can be detected with high probability. I find that (1) mixed effects modeling and using Ordinary Least Squares (OLS) on data aggregated to the cluster level both control the Type I error rate, (2) randomization within blocks is always recommended, but how best to account for blocking through covariate adjustment depends on whether the precision gains offset the degrees of freedom loss, (3) power calculations can be accurate when design effects from small sample, spurious correlations are taken into account, and (4) it is very difficult to detect small effects with just four clusters, but with six or more clusters, there are realistic circumstances under which small effects can be detected with high probability. © The Author(s) 2016.
Walthouwer, Michel Jean Louis; Oenema, Anke; Soetens, Katja; Lechner, Lilian; de Vries, Hein
2014-11-01
Developing nutrition education interventions based on clusters of dietary patterns can only be done adequately when it is clear if distinctive clusters of dietary patterns can be derived and reproduced over time, if cluster membership is stable, and if it is predictable which type of people belong to a certain cluster. Hence, this study aimed to: (1) identify clusters of dietary patterns among Dutch adults, (2) test the reproducibility of these clusters and stability of cluster membership over time, and (3) identify sociodemographic predictors of cluster membership and cluster transition. This study had a longitudinal design with online measurements at baseline (N=483) and 6 months follow-up (N=379). Dietary intake was assessed with a validated food frequency questionnaire. A hierarchical cluster analysis was performed, followed by a K-means cluster analysis. Multinomial logistic regression analyses were conducted to identify the sociodemographic predictors of cluster membership and cluster transition. At baseline and follow-up, a comparable three-cluster solution was derived, distinguishing a healthy, moderately healthy, and unhealthy dietary pattern. Male and lower educated participants were significantly more likely to have a less healthy dietary pattern. Further, 251 (66.2%) participants remained in the same cluster, 45 (11.9%) participants changed to an unhealthier cluster, and 83 (21.9%) participants shifted to a healthier cluster. Men and people living alone were significantly more likely to shift toward a less healthy dietary pattern. Distinctive clusters of dietary patterns can be derived. Yet, cluster membership is unstable and only few sociodemographic factors were associated with cluster membership and cluster transition. These findings imply that clusters based on dietary intake may not be suitable as a basis for nutrition education interventions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Cluster analysis of activity-time series in motor learning
DEFF Research Database (Denmark)
Balslev, Daniela; Nielsen, Finn Årup; Frutiger, Sally A.
2002-01-01
Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing. Hum. Brain Mapping 15...
Cluster analysis of activity-time series in motor learning
DEFF Research Database (Denmark)
Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A
2002-01-01
Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...
Cluster analysis of activity-time series in motor learning
DEFF Research Database (Denmark)
Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A
2002-01-01
Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel......-time series. The optimal number of clusters was chosen using a cross-validated likelihood method, which highlights the clustering pattern that generalizes best over the subjects. Data were acquired with PET at different time points during practice of a visuomotor task. The results from cluster analysis show...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...
Cluster analysis of Southeastern U.S. climate stations
Stooksbury, D. E.; Michaels, P. J.
1991-09-01
A two-step cluster analysis of 449 Southeastern climate stations is used to objectively determine general climate clusters (groups of climate stations) for eight southeastern states. The purpose is objectively to define regions of climatic homogeneity that should perform more robustly in subsequent climatic impact models. This type of analysis has been successfully used in many related climate research problems including the determination of corn/climate districts in Iowa (Ortiz-Valdez, 1985) and the classification of synoptic climate types (Davis, 1988). These general climate clusters may be more appropriate for climate research than the standard climate divisions (CD) groupings of climate stations, which are modifications of the agro-economic United States Department of Agriculture crop reporting districts. Unlike the CD's, these objectively determined climate clusters are not restricted by state borders and thus have reduced multicollinearity which makes them more appropriate for the study of the impact of climate and climatic change.
A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis
Directory of Open Access Journals (Sweden)
Shaoning Li
2017-01-01
Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.
Climate Modeling with a Linux Cluster
Renold, M.; Beyerle, U.; Raible, C. C.; Knutti, R.; Stocker, T. F.; Craig, T.
2004-08-01
Until recently, computationally intensive calculations in many scientific disciplines have been limited to institutions which have access to supercomputing centers. Today, the computing power of PC processors permits the assembly of inexpensive PC clusters that nearly approach the power of supercomputers. Moreover, the combination of inexpensive network cards and Open Source software provides an easy linking of standard computer equipment to enlarge such clusters. Universities and other institutions have taken this opportunity and built their own mini-supercomputers on site. Computing power is a particular issue for the climate modeling and impacts community. The purpose of this article is to make available a Linux cluster version of the Community Climate System Model developed by the National Center for Atmospheric Research (NCAR; http://www.cgd.ucar.edu/csm).
Dynamic exponents for potts model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
We have studied the Swendsen-Wang and Wolff cluster update algorithms for the Ising model in 2, 3 and 4 dimensions. The data indicate simple relations between the specific heat and the Wolff autocorrelations, and between the magnetization and the Swendsen-Wang autocorrelations. This implies that the dynamic critical exponents are related to the static exponents of the Ising model. We also investigate the possibility of similar relationships for the Q-state Potts model.
Cluster and constraint analysis in tetrahedron packings.
Jin, Weiwei; Lu, Peng; Liu, Lufeng; Li, Shuixiang
2015-04-01
The disordered packings of tetrahedra often show no obvious macroscopic orientational or positional order for a wide range of packing densities, and it has been found that the local order in particle clusters is the main order form of tetrahedron packings. Therefore, a cluster analysis is carried out to investigate the local structures and properties of tetrahedron packings in this work. We obtain a cluster distribution of differently sized clusters, and peaks are observed at two special clusters, i.e., dimer and wagon wheel. We then calculate the amounts of dimers and wagon wheels, which are observed to have linear or approximate linear correlations with packing density. Following our previous work, the amount of particles participating in dimers is used as an order metric to evaluate the order degree of the hierarchical packing structure of tetrahedra, and an order map is consequently depicted. Furthermore, a constraint analysis is performed to determine the isostatic or hyperstatic region in the order map. We employ a Monte Carlo algorithm to test jamming and then suggest a new maximally random jammed packing of hard tetrahedra from the order map with a packing density of 0.6337.
Identifying Peer Institutions Using Cluster Analysis
Boronico, Jess; Choksi, Shail S.
2012-01-01
The New York Institute of Technology's (NYIT) School of Management (SOM) wishes to develop a list of peer institutions for the purpose of benchmarking and monitoring/improving performance against other business schools. The procedure utilizes relevant criteria for the purpose of establishing this peer group by way of a cluster analysis. The…
Managing Clustered Data Using Hierarchical Linear Modeling
Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.
2012-01-01
Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
Managing Clustered Data Using Hierarchical Linear Modeling
Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.
2012-01-01
Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
Aerosol cluster impact and break-up : II. Atomic and Cluster Scale Models.
Energy Technology Data Exchange (ETDEWEB)
Lechman, Jeremy B.; Takato, Yoichi (State University of New York at Buffalo, Buffalo, NY)
2010-09-01
Understanding the interaction of aerosol particle clusters/flocs with surfaces is an area of interest for a number of processes in chemical, pharmaceutical, and powder manufacturing as well as in steam-tube rupture in nuclear power plants. Developing predictive capabilities for these applications involves coupled phenomena on multiple length and timescales from the process macroscopic scale ({approx}1m) to the multi-cluster interaction scale (1mm-0.1m) to the single cluster scale ({approx}1000 - 10000 particles) to the particle scale (10nm-10{micro}m) interactions, and on down to the sub-particle, atomic scale interactions. The focus of this report is on the single cluster scale; although work directed toward developing better models of particle-particle interactions by considering sub-particle scale interactions and phenomena is also described. In particular, results of mesoscale (i.e., particle to single cluster scale) discrete element method (DEM) simulations for aerosol cluster impact with rigid walls are presented. The particle-particle interaction model is based on JKR adhesion theory and is implemented as an enhancement to the granular package in the LAMMPS code. The theory behind the model is outlined and preliminary results are shown. Additionally, as mentioned, results from atomistic classical molecular dynamics simulations are also described as a means of developing higher fidelity models of particle-particle interactions. Ultimately, the results from these and other studies at various scales must be collated to provide systems level models with accurate 'sub-grid' information for design, analysis and control of the underlying systems processes.
Towards Realistic Modeling of Massive Star Clusters
Gnedin, O.; Li, H.
2016-06-01
Cosmological simulations of galaxy formation are rapidly advancing towards smaller scales. Current models can now resolve giant molecular clouds in galaxies and predict basic properties of star clusters forming within them. I will describe new theoretical simulations of the formation of the Milky Way throughout cosmic time, with the adaptive mesh refinement code ART. However, many challenges - physical and numerical - still remain. I will discuss how observations of massive star clusters and star forming regions can help us overcome some of them. Video of the talk is available at https://goo.gl/ZoZOfX
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best
Gravothermal Star Clusters - Theory and Computer Modelling
Spurzem, Rainer
2010-11-01
In the George Darwin lecture, delivered to the British Royal Astronomical Society in 1960 by Viktor A. Ambartsumian he wrote on the evolution of stellar systems that it can be described by the "dynamic evolution of a gravitating gas" complemented by "a statistical description of the changes in the physical states of stars". This talk will show how this physical concept has inspired theoretical modeling of star clusters in the following decades up to the present day. The application of principles of thermodynamics shows, as Ambartsumian argued in his 1960 lecture, that there is no stable state of equilibrium of a gravitating star cluster. The trend to local thermodynamic equilibrium is always disturbed by escaping stars (Ambartsumian), as well as by gravothermal and gravogyro instabilities, as it was detected later. Here the state-of-the-art of modeling the evolution of dense stellar systems based on principles of thermodynamics and statistical mechanics (Fokker-Planck approximation) will be reviewed. Recent progress including rotation and internal correlations (primordial binaries) is presented. The models have also very successfully been used to study dense star clusters around massive black holes in galactic nuclei and even (in a few cases) relativistic supermassive dense objects in centres of galaxies (here again briefly touching one of the many research fields of V.A. Ambartsumian). For the modern present time of high-speed supercomputing, where we are tackling direct N-body simulations of star clusters, we will show that such direct modeling supports and proves the concept of the statistical models based on the Fokker-Planck theory, and that both theoretical concepts and direct computer simulations are necessary to support each other and make scientific progress in the study of star cluster evolution.
Modelling nano-clusters and nucleation.
Catlow, C Richard A; Bromley, Stefan T; Hamad, Said; Mora-Fonz, Miguel; Sokol, Alexey A; Woodley, Scott M
2010-01-28
We review the growing role of computational techniques in modelling the structures and properties of nano-particulate oxides and sulphides. We describe the main methods employed, including those based on both electronic structure and interatomic potential approaches. Particular attention is paid to the techniques used in searching for global minima in the energy landscape defined by the nano-particle cluster. We summarise applications to the widely studied ZnO and ZnS systems, to silica nanochemistry and to group IV oxides including TiO(2). We also consider the special case of silica cluster chemistry in solution and its importance in understanding the hydrothermal synthesis of microporous materials. The work summarised, together with related experimental studies, demonstrates a rich and varied nano-cluster chemistry for these materials.
Modeling blue stragglers in young clusters
Institute of Scientific and Technical Information of China (English)
Pin Lu; Li-Cai Deng; Xiao-Bin Zhang
2011-01-01
A grid of binary evolution models are calculated for the study of a blue straggler (BS) population in intermediate age (log Age =7.85 - 8.95) star clusters.The BS formation via mass transfer and merging is studied systematically using our models.Both Case A and B close binary evolutionary tracks are calculated for a large range of parameters.The results show that BSs formed via Case B are generally bluer and even more luminous than those produced by Case A.Furthermore,the larger range in orbital separations of Case B models provides a probability of producing more BSs than in Case A.Based on the grid of models,several Monte-Carlo simulations of BS populations in the clusters in the age range are carried out.The results show that BSs formed via different channels populate different areas in the color magnitude diagram (CMD).The locations of BSs in CMD for a number of clusters are compared to our simulations as well.In order to investigate the influence of mass transfer efficiency in the models and simulations,a set of models is also calculated by implementing a constant mass transfer efficiency,β =0.5,during Roche lobe overflow (Case A binary evolution excluded).The result shows BSs can be formed via mass transfer at any given age in both cases.However,the distributions of the BS populations on CMD are different.
Wind farms model aggregation using probabilistic clustering
Fernandes, Paula Odete; Ferreira, Ángela Paula
2013-10-01
The main objective of this research is the identification of homogeneous groups within wind farms of a major operator playing in the energy sector in Portugal, based on two multivariate analyses: Hierarchical Cluster Analysis and Discriminant Analysis, by using two independent variables: annual liquid hours and net production. From the produced outputs there were identified three homogenous groups of wind farms: (1) medium Installed Capacity and Induction Generator based Technology, (2) high Installed Capacity and Synchronous Generator based Technology and (3) medium Installed Capacity and Synchronous Generator based Technology, which includes the wind farms with the higher annual liquid hours. It has been found that the results obtained by cluster analysis are well classified, with a total percentage of correct classification of 97,1%, which can be considered excellent.
Cluster analysis of obesity and asthma phenotypes.
Directory of Open Access Journals (Sweden)
E Rand Sutherland
Full Text Available BACKGROUND: Asthma is a heterogeneous disease with variability among patients in characteristics such as lung function, symptoms and control, body weight, markers of inflammation, and responsiveness to glucocorticoids (GC. Cluster analysis of well-characterized cohorts can advance understanding of disease subgroups in asthma and point to unsuspected disease mechanisms. We utilized an hypothesis-free cluster analytical approach to define the contribution of obesity and related variables to asthma phenotype. METHODOLOGY AND PRINCIPAL FINDINGS: In a cohort of clinical trial participants (n = 250, minimum-variance hierarchical clustering was used to identify clinical and inflammatory biomarkers important in determining disease cluster membership in mild and moderate persistent asthmatics. In a subset of participants, GC sensitivity was assessed via expression of GC receptor alpha (GCRα and induction of MAP kinase phosphatase-1 (MKP-1 expression by dexamethasone. Four asthma clusters were identified, with body mass index (BMI, kg/m(2 and severity of asthma symptoms (AEQ score the most significant determinants of cluster membership (F = 57.1, p<0.0001 and F = 44.8, p<0.0001, respectively. Two clusters were composed of predominantly obese individuals; these two obese asthma clusters differed from one another with regard to age of asthma onset, measures of asthma symptoms (AEQ and control (ACQ, exhaled nitric oxide concentration (F(ENO and airway hyperresponsiveness (methacholine PC(20 but were similar with regard to measures of lung function (FEV(1 (% and FEV(1/FVC, airway eosinophilia, IgE, leptin, adiponectin and C-reactive protein (hsCRP. Members of obese clusters demonstrated evidence of reduced expression of GCRα, a finding which was correlated with a reduced induction of MKP-1 expression by dexamethasone CONCLUSIONS AND SIGNIFICANCE: Obesity is an important determinant of asthma phenotype in adults. There is heterogeneity in
Corrigan, Neil; Bankart, Michael J G; Gray, Laura J; Smith, Karen L
2014-05-24
There are many methodological challenges in the conduct and analysis of cluster randomised controlled trials, but one that has received little attention is that of post-randomisation changes to cluster composition. To illustrate this, we focus on the issue of cluster merging, considering the impact on the design, analysis and interpretation of trial outcomes. We explored the effects of merging clusters on study power using standard methods of power calculation. We assessed the potential impacts on study findings of both homogeneous cluster merges (involving clusters randomised to the same arm of a trial) and heterogeneous merges (involving clusters randomised to different arms of a trial) by simulation. To determine the impact on bias and precision of treatment effect estimates, we applied standard methods of analysis to different populations under analysis. Cluster merging produced a systematic reduction in study power. This effect depended on the number of merges and was most pronounced when variability in cluster size was at its greatest. Simulations demonstrate that the impact on analysis was minimal when cluster merges were homogeneous, with impact on study power being balanced by a change in observed intracluster correlation coefficient (ICC). We found a decrease in study power when cluster merges were heterogeneous, and the estimate of treatment effect was attenuated. Examples of cluster merges found in previously published reports of cluster randomised trials were typically homogeneous rather than heterogeneous. Simulations demonstrated that trial findings in such cases would be unbiased. However, simulations also showed that any heterogeneous cluster merges would introduce bias that would be hard to quantify, as well as having negative impacts on the precision of estimates obtained. Further methodological development is warranted to better determine how to analyse such trials appropriately. Interim recommendations include avoidance of cluster merges where
Semi-supervised consensus clustering for gene expression data analysis
Wang, Yunli; Pan, Youlian
2014-01-01
Background Simple clustering methods such as hierarchical clustering and k-means are widely used for gene expression data analysis; but they are unable to deal with noise and high dimensionality associated with the microarray gene expression data. Consensus clustering appears to improve the robustness and quality of clustering results. Incorporating prior knowledge in clustering process (semi-supervised clustering) has been shown to improve the consistency between the data partitioning and do...
Ionisation clusters at DNA level: experimental modelling
Energy Technology Data Exchange (ETDEWEB)
Pszona, S.; Kula, J
2002-07-01
The importance of initial clustered damage to DNA is a hypothesis, which has to be approached also from physical modelling of the initial products of single charged particle interaction with DNA. A new tool for such studies, presented here, is based on modelling of the ionisation patterns resulting from a single charged particle crossing a nitrogen cavity of nanometre size. The nanometre size sites equivalent in unit density to DNA and nucleosome, have been modelled in a device, called a Jet Counter, consisting of a pulse operated valve which inject nitrogen in the form of an expansion jet into a interaction chamber. The distributions of the number of ions in a cluster created by a single alpha particle of 4.6 MeV along 0.15 nm to 13 nm size in nitrogen have been measured. A new descriptor of radiation action at DNA level is proposed. (author)
Geographic atrophy phenotype identification by cluster analysis.
Monés, Jordi; Biarnés, Marc
2017-07-20
To identify ocular phenotypes in patients with geographic atrophy secondary to age-related macular degeneration (GA) using a data-driven cluster analysis. This was a retrospective analysis of data from a prospective, natural history study of patients with GA who were followed for ≥6 months. Cluster analysis was used to identify subgroups within the population based on the presence of several phenotypic features: soft drusen, reticular pseudodrusen (RPD), primary foveal atrophy, increased fundus autofluorescence (FAF), greyish FAF appearance and subfoveal choroidal thickness (SFCT). A comparison of features between the subgroups was conducted, and a qualitative description of the new phenotypes was proposed. The atrophy growth rate between phenotypes was then compared. Data were analysed from 77 eyes of 77 patients with GA. Cluster analysis identified three groups: phenotype 1 was characterised by high soft drusen load, foveal atrophy and slow growth; phenotype 3 showed high RPD load, extrafoveal and greyish FAF appearance and thin SFCT; the characteristics of phenotype 2 were midway between phenotypes 1 and 3. Phenotypes differed in all measured features (p≤0.013), with decreases in the presence of soft drusen, foveal atrophy and SFCT seen from phenotypes 1 to 3 and corresponding increases in high RPD load, high FAF and greyish FAF appearance. Atrophy growth rate differed between phenotypes 1, 2 and 3 (0.63, 1.91 and 1.73 mm(2)/year, respectively, p=0.0005). Cluster analysis identified three distinct phenotypes in GA. One of them showed a particularly slow growth pattern. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Benson, Charles; Watson, Philip; Taylor, Garth; Cook, Philip; Hollenhorst, Steve
2013-10-01
Yellowstone National Park visitor data were obtained from a survey collected for the National Park Service by the Park Studies Unit at the University of Idaho. Travel cost models have been conducted for national parks in the United States; however, this study builds on these studies and investigates how benefits vary by types of visitors who participate in different activities while at the park. Visitor clusters were developed based on activities in which a visitor participated while at the park. The clusters were analyzed and then incorporated into a travel cost model to determine the economic value (consumer surplus) that the different visitor groups received from visiting the park. The model was estimated using a zero-truncated negative binomial regression corrected for endogenous stratification. The travel cost price variable was estimated using both 1/3 and 1/4 the wage rate to test for sensitivity to opportunity cost specification. The average benefit across all visitor cluster groups was estimated at between $235 and $276 per person per trip. However, per trip benefits varied substantially across clusters; from $90 to $103 for the "value picnickers," to $185-$263 for the "backcountry enthusiasts," $189-$278 for the "do it all adventurists," $204-$303 for the "windshield tourists," and $323-$714 for the "creature comfort" cluster group.
Wagner-Kaiser, R; Sarajedini, A; von Hippel, T; van Dyk, D A; Robinson, E; Stein, N; Jefferys, W H
2016-01-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic Globular Clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ~0.04 to 0.11. Because adequate models varying in CNO are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed g...
Statistical analysis of bound companions in the Coma cluster
Mendelin, Martin; Binggeli, Bruno
2017-08-01
Aims: The rich and nearby Coma cluster of galaxies is known to have substructure. We aim to create a more detailed picture of this substructure by searching directly for bound companions around individual giant members. Methods: We have used two catalogs of Coma galaxies, one covering the cluster core for a detailed morphological analysis, another covering the outskirts. The separation limit between possible companions (secondaries) and giants (primaries) is chosen as MB = -19 and MR = -20, respectively for the two catalogs. We have created pseudo-clusters by shuffling positions or velocities of the primaries and search for significant over-densities of possible companions around giants by comparison with the data. This method was developed and applied first to the Virgo cluster. In a second approach we introduced a modified nearest neighbor analysis using several interaction parameters for all galaxies. Results: We find evidence for some excesses due to possible companions for both catalogs. Satellites are typically found among the faintest dwarfs (MB type giants (spirals) in the outskirts, which is expected in an infall scenario of cluster evolution. A rough estimate for an upper limit of bound galaxies within Coma is 2-4%, to be compared with 7% for Virgo. Conclusions: The results agree well with the expected low frequency of bound companions in a regular cluster such as Coma. To exploit the data more fully and reach more detailed insights into the physics of cluster evolution we suggest applying the method also to model clusters created by N-body simulations for comparison.
MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS
Directory of Open Access Journals (Sweden)
Jana Halčinová
2014-06-01
Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.
Magnetic susceptibilities of cluster-hierarchical models
McKay, Susan R.; Berker, A. Nihat
1984-02-01
The exact magnetic susceptibilities of hierarchical models are calculated near and away from criticality, in both the ordered and disordered phases. The mechanism and phenomenology are discussed for models with susceptibilities that are physically sensible, e.g., nondivergent away from criticality. Such models are found based upon the Niemeijer-van Leeuwen cluster renormalization. A recursion-matrix method is presented for the renormalization-group evaluation of response functions. Diagonalization of this matrix at fixed points provides simple criteria for well-behaved densities and response functions.
Mapping Cigarettes Similarities using Cluster Analysis Methods
Directory of Open Access Journals (Sweden)
Lorentz JÃƒÂ¤ntschi
2007-09-01
Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.
Warda, Alicja K.; Xiao, Yinghua; Boekhorst, Jos; Wells-Bennik, Marjon H.J.; Nierop Groot, Masja N.; Abee, Tjakko
2017-01-01
Spore germination of 17 Bacillus cereus food isolates and reference strains was evaluated using flow cytometry analysis in combination with fluorescent staining at a single-spore level. This approach allowed for rapid collection of germination data under more than 20 conditions, including heat ac
Outcome-Driven Cluster Analysis with Application to Microarray Data.
Directory of Open Access Journals (Sweden)
Jessie J Hsu
Full Text Available One goal of cluster analysis is to sort characteristics into groups (clusters so that those in the same group are more highly correlated to each other than they are to those in other groups. An example is the search for groups of genes whose expression of RNA is correlated in a population of patients. These genes would be of greater interest if their common level of RNA expression were additionally predictive of the clinical outcome. This issue arose in the context of a study of trauma patients on whom RNA samples were available. The question of interest was whether there were groups of genes that were behaving similarly, and whether each gene in the cluster would have a similar effect on who would recover. For this, we develop an algorithm to simultaneously assign characteristics (genes into groups of highly correlated genes that have the same effect on the outcome (recovery. We propose a random effects model where the genes within each group (cluster equal the sum of a random effect, specific to the observation and cluster, and an independent error term. The outcome variable is a linear combination of the random effects of each cluster. To fit the model, we implement a Markov chain Monte Carlo algorithm based on the likelihood of the observed data. We evaluate the effect of including outcome in the model through simulation studies and describe a strategy for prediction. These methods are applied to trauma data from the Inflammation and Host Response to Injury research program, revealing a clustering of the genes that are informed by the recovery outcome.
Cluster infall in the concordance LCDM model
Pivato, M C; Lambas, D G; Pivato, Maximiliano C.; Padilla, Nelson D.; Lambas, Diego G.
2005-01-01
We perform statistical analyses of the infall of dark-matter onto clusters in numerical simulations within the concordance LCDM model. By studying the infall profile around clusters of different mass, we find a linear relation between the maximum infall velocity and mass which reach 900km/s for the most massive groups. The maximum infall velocity and the group mass follow a suitable power law fit of the form, V_{inf}^{max} = (M/m_0)^{gamma}. By comparing the measured infall velocity to the linear infall model with an exponential cutoff introduced by Croft et al., we find that the best agreement is obtained for a critical overdensity delta_c = 45. We study the dependence of the direction of infall with respect to the cluster centres, and find that in the case of massive groups, the maximum alignment occurs at scales r ~ 6Mpc/h. We obtain a logarithmic power-law relation between the average infall angle and the group mass. We also study the dependence of the results on the local dark-matter density, finding a r...
Equivalent damage validation by variable cluster analysis
Drago, Carlo; Ferlito, Rachele; Zucconi, Maria
2016-06-01
The main aim of this work is to perform a clustering analysis on the damage relieved in the old center of L'Aquila after the earthquake occurred on April 6, 2009 and to validate an Indicator of Equivalent Damage ED that summarizes the information reported on the AeDES card regarding the level of damage and their extension on the surface of the buildings. In particular we used a sample of 13442 masonry buildings located in an area characterized by a Macroseismic Intensity equal to 8 [1]. The aim is to ensure the coherence between the clusters and its hierarchy identified in the data of damage detected and in the data of the ED elaborated.
Learning From Hidden Traits: Joint Factor Analysis and Latent Clustering
Yang, Bo; Fu, Xiao; Sidiropoulos, Nicholas D.
2017-01-01
Dimensionality reduction techniques play an essential role in data analytics, signal processing and machine learning. Dimensionality reduction is usually performed in a preprocessing stage that is separate from subsequent data analysis, such as clustering or classification. Finding reduced-dimension representations that are well-suited for the intended task is more appealing. This paper proposes a joint factor analysis and latent clustering framework, which aims at learning cluster-aware low-dimensional representations of matrix and tensor data. The proposed approach leverages matrix and tensor factorization models that produce essentially unique latent representations of the data to unravel latent cluster structure -- which is otherwise obscured because of the freedom to apply an oblique transformation in latent space. At the same time, latent cluster structure is used as prior information to enhance the performance of factorization. Specific contributions include several custom-built problem formulations, corresponding algorithms, and discussion of associated convergence properties. Besides extensive simulations, real-world datasets such as Reuters document data and MNIST image data are also employed to showcase the effectiveness of the proposed approaches.
The Quantitative Analysis of Chennai Automotive Industry Cluster
Bhaskaran, Ethirajan
2016-07-01
Chennai, also called as Detroit of India due to presence of Automotive Industry producing over 40 % of the India's vehicle and components. During 2001-2002, the Automotive Component Industries (ACI) in Ambattur, Thirumalizai and Thirumudivakkam Industrial Estate, Chennai has faced problems on infrastructure, technology, procurement, production and marketing. The objective is to study the Quantitative Performance of Chennai Automotive Industry Cluster before (2001-2002) and after the CDA (2008-2009). The methodology adopted is collection of primary data from 100 ACI using quantitative questionnaire and analyzing using Correlation Analysis (CA), Regression Analysis (RA), Friedman Test (FMT), and Kruskall Wallis Test (KWT).The CA computed for the different set of variables reveals that there is high degree of relationship between the variables studied. The RA models constructed establish the strong relationship between the dependent variable and a host of independent variables. The models proposed here reveal the approximate relationship in a closer form. KWT proves, there is no significant difference between three locations clusters with respect to: Net Profit, Production Cost, Marketing Costs, Procurement Costs and Gross Output. This supports that each location has contributed for development of automobile component cluster uniformly. The FMT proves, there is no significant difference between industrial units in respect of cost like Production, Infrastructure, Technology, Marketing and Net Profit. To conclude, the Automotive Industries have fully utilized the Physical Infrastructure and Centralised Facilities by adopting CDA and now exporting their products to North America, South America, Europe, Australia, Africa and Asia. The value chain analysis models have been implemented in all the cluster units. This Cluster Development Approach (CDA) model can be implemented in industries of under developed and developing countries for cost reduction and productivity
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Three-Dimensional Modeling of Fracture Clusters in Geothermal Reservoirs
Energy Technology Data Exchange (ETDEWEB)
Ghassemi, Ahmad [Univ. of Oklahoma, Norman, OK (United States)
2017-08-11
The objective of this is to develop a 3-D numerical model for simulating mode I, II, and III (tensile, shear, and out-of-plane) propagation of multiple fractures and fracture clusters to accurately predict geothermal reservoir stimulation using the virtual multi-dimensional internal bond (VMIB). Effective development of enhanced geothermal systems can significantly benefit from improved modeling of hydraulic fracturing. In geothermal reservoirs, where the temperature can reach or exceed 350oC, thermal and poro-mechanical processes play an important role in fracture initiation and propagation. In this project hydraulic fracturing of hot subsurface rock mass will be numerically modeled by extending the virtual multiple internal bond theory and implementing it in a finite element code, WARP3D, a three-dimensional finite element code for solid mechanics. The new constitutive model along with the poro-thermoelastic computational algorithms will allow modeling the initiation and propagation of clusters of fractures, and extension of pre-existing fractures. The work will enable the industry to realistically model stimulation of geothermal reservoirs. The project addresses the Geothermal Technologies Office objective of accurately predicting geothermal reservoir stimulation (GTO technology priority item). The project goal will be attained by: (i) development of the VMIB method for application to 3D analysis of fracture clusters; (ii) development of poro- and thermoelastic material sub-routines for use in 3D finite element code WARP3D; (iii) implementation of VMIB and the new material routines in WARP3D to enable simulation of clusters of fractures while accounting for the effects of the pore pressure, thermal stress and inelastic deformation; (iv) simulation of 3D fracture propagation and coalescence and formation of clusters, and comparison with laboratory compression tests; and (v) application of the model to interpretation of injection experiments (planned by our
Data Clustering Analysis Based on Wavelet Feature Extraction
Institute of Scientific and Technical Information of China (English)
QIANYuntao; TANGYuanyan
2003-01-01
A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.
Wagner-Kaiser, R.; Stenning, D. C.; Sarajedini, A.; von Hippel, T.; van Dyk, D. A.; Robinson, E.; Stein, N.; Jefferys, W. H.
2016-12-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of 30 Galactic globular clusters to characterize two distinct stellar populations. A sophisticated Bayesian technique is employed to simultaneously sample the joint posterior distribution of age, distance, and extinction for each cluster, as well as unique helium values for two populations within each cluster and the relative proportion of those populations. We find the helium differences among the two populations in the clusters fall in the range of ˜0.04 to 0.11. Because adequate models varying in carbon, nitrogen, and oxygen are not presently available, we view these spreads as upper limits and present them with statistical rather than observational uncertainties. Evidence supports previous studies suggesting an increase in helium content concurrent with increasing mass of the cluster and we also find that the proportion of the first population of stars increases with mass as well. Our results are examined in the context of proposed globular cluster formation scenarios. Additionally, we leverage our Bayesian technique to shed light on the inconsistencies between the theoretical models and the observed data.
Initial magnetization analysis of iron cluster assemblies
Energy Technology Data Exchange (ETDEWEB)
Michele, Oliver; Hesse, Juergen; Bremers, Heiko [Technische Universitaet Braunschweig, Institut fuer Metallphysik und Nukleare Festkoerperphysik, Mendelssohnstrasse 3, 38106 Braunschweig (Germany); Peng, Dong-Lian; Sumiyama, Kenji; Hihara, Takehiko; Yamamuro, Saeki [Department of Materials Science and Engineering, Nagoya Institute of Technology, Nagoya 466-8555 (Japan)
2004-12-01
Nearly monodispersed oxide-coated Fe cluster assemblies were prepared using a plasma-gas-condensation style cluster beam deposition apparatus (D. L. Peng et al. J. Appl. Phys. 92 3075 (2002)). The characterization of such assemblies is presented using SQUID magnetometry. The aim of this contribution is the interpretation of the initial magnetization curves instead of the usual presentation of hysteresis loops and coercivities. The description of the initial magnetization is based on a proposed vector model valid for Stoner-Wohlfarth particles. The model includes the particles' anisotropy and possible interactions regarding these influences as equivalent magnetic fields. The model is an extension of the one described by Michele et al. (J. Phys.: Condens. Matter 16 427 (2004)) regarding the fact that in a completely demagnetized state, in the sample consisting of a very large number of particles always equal anisotropy fields of opposite signs are present. We measured the initial magnetization curves for different temperatures and present the temperature dependence of the model's parameters. (Abstract Copyright [2004], Wiley Periodicals, Inc.)
Latent cluster analysis of ALS phenotypes identifies prognostically differing groups.
Directory of Open Access Journals (Sweden)
Jeban Ganesalingam
Full Text Available BACKGROUND: Amyotrophic lateral sclerosis (ALS is a degenerative disease predominantly affecting motor neurons and manifesting as several different phenotypes. Whether these phenotypes correspond to different underlying disease processes is unknown. We used latent cluster analysis to identify groupings of clinical variables in an objective and unbiased way to improve phenotyping for clinical and research purposes. METHODS: Latent class cluster analysis was applied to a large database consisting of 1467 records of people with ALS, using discrete variables which can be readily determined at the first clinic appointment. The model was tested for clinical relevance by survival analysis of the phenotypic groupings using the Kaplan-Meier method. RESULTS: The best model generated five distinct phenotypic classes that strongly predicted survival (p<0.0001. Eight variables were used for the latent class analysis, but a good estimate of the classification could be obtained using just two variables: site of first symptoms (bulbar or limb and time from symptom onset to diagnosis (p<0.00001. CONCLUSION: The five phenotypic classes identified using latent cluster analysis can predict prognosis. They could be used to stratify patients recruited into clinical trials and generating more homogeneous disease groups for genetic, proteomic and risk factor research.
Constructing storyboards based on hierarchical clustering analysis
Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu
2005-07-01
There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.
Vesselinov, V. V.; Alexandrov, B.
2014-12-01
The identification of the physical sources causing spatial and temporal fluctuations of state variables such as river stage levels and aquifer hydraulic heads is challenging. The fluctuations can be caused by variations in natural and anthropogenic sources such as precipitation events, infiltration, groundwater pumping, barometric pressures, etc. The source identification and separation can be crucial for conceptualization of the hydrological conditions and characterization of system properties. If the original signals that cause the observed state-variable transients can be successfully "unmixed", decoupled physics models may then be applied to analyze the propagation of each signal independently. We propose a new model-free inverse analysis of transient data based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS) coupled with k-means clustering algorithm, which we call NMFk. NMFk is capable of identifying a set of unique sources from a set of experimentally measured mixed signals, without any information about the sources, their transients, and the physical mechanisms and properties controlling the signal propagation through the system. A classical BSS conundrum is the so-called "cocktail-party" problem where several microphones are recording the sounds in a ballroom (music, conversations, noise, etc.). Each of the microphones is recording a mixture of the sounds. The goal of BSS is to "unmix'" and reconstruct the original sounds from the microphone records. Similarly to the "cocktail-party" problem, our model-freee analysis only requires information about state-variable transients at a number of observation points, m, where m > r, and r is the number of unknown unique sources causing the observed fluctuations. We apply the analysis on a dataset from the Los Alamos National Laboratory (LANL) site. We identify and estimate the impact and sources are barometric pressure and water-supply pumping effects. We also estimate the
Latent Clustering Models for Outlier Identification in Telecom Data
Directory of Open Access Journals (Sweden)
Ye Ouyang
2016-01-01
Full Text Available Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA and time-dependent Gaussian Mixture Models (time-GMM. These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information. We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.
Dunn, Heather; Quinn, Laurie; Corbridge, Susan J; Eldeirawi, Kamal; Kapella, Mary; Collins, Eileen G
2017-05-01
The use of cluster analysis in the nursing literature is limited to the creation of classifications of homogeneous groups and the discovery of new relationships. As such, it is important to provide clarity regarding its use and potential. The purpose of this article is to provide an introduction to distance-based, partitioning-based, and model-based cluster analysis methods commonly utilized in the nursing literature, provide a brief historical overview on the use of cluster analysis in nursing literature, and provide suggestions for future research. An electronic search included three bibliographic databases, PubMed, CINAHL and Web of Science. Key terms were cluster analysis and nursing. The use of cluster analysis in the nursing literature is increasing and expanding. The increased use of cluster analysis in the nursing literature is positioning this statistical method to result in insights that have the potential to change clinical practice.
Dumenci, Levent; Windle, Michael
2001-01-01
Used Monte Carlo methods to evaluate the adequacy of cluster analysis to recover group membership based on simulated latent growth curve (LCG) models. Cluster analysis failed to recover growth subtypes adequately when the difference between growth curves was shape only. Discusses circumstances under which it was more successful. (SLD)
Tassaing, T; Garrain, P A; Bégué, D; Baraille, I
2010-07-21
The present study is aimed at a detailed analysis of supercritical water structure based on the combination of experimental vibrational spectra as well as molecular modeling calculations of isolated water clusters. We propose an equilibrium cluster composition model where supercritical water is considered as an ideal mixture of small water clusters (n=1-3) at the chemical equilibrium and the vibrational spectra are expected to result from the superposition of the spectra of the individual clusters, Thus, it was possible to extract from the decomposition of the midinfrared spectra the evolution of the partition of clusters in supercritical water as a function of density. The cluster composition predicted by this model was found to be quantitatively consistent with the near infrared and Raman spectra of supercritical water analyzed using the same procedure. We emphasize that such methodology could be applied to determine the portion of cluster in water in a wider thermodynamic range as well as in more complex aqueous supercritical solutions.
Cluster-based exposure variation analysis.
Samani, Afshin; Mathiassen, Svend Erik; Madeleine, Pascal
2013-04-04
Static posture, repetitive movements and lack of physical variation are known risk factors for work-related musculoskeletal disorders, and thus needs to be properly assessed in occupational studies. The aims of this study were (i) to investigate the effectiveness of a conventional exposure variation analysis (EVA) in discriminating exposure time lines and (ii) to compare it with a new cluster-based method for analysis of exposure variation. For this purpose, we simulated a repeated cyclic exposure varying within each cycle between "low" and "high" exposure levels in a "near" or "far" range, and with "low" or "high" velocities (exposure change rates). The duration of each cycle was also manipulated by selecting a "small" or "large" standard deviation of the cycle time. Theses parameters reflected three dimensions of exposure variation, i.e. range, frequency and temporal similarity.Each simulation trace included two realizations of 100 concatenated cycles with either low (ρ = 0.1), medium (ρ = 0.5) or high (ρ = 0.9) correlation between the realizations. These traces were analyzed by conventional EVA, and a novel cluster-based EVA (C-EVA). Principal component analysis (PCA) was applied on the marginal distributions of 1) the EVA of each of the realizations (univariate approach), 2) a combination of the EVA of both realizations (multivariate approach) and 3) C-EVA. The least number of principal components describing more than 90% of variability in each case was selected and the projection of marginal distributions along the selected principal component was calculated. A linear classifier was then applied to these projections to discriminate between the simulated exposure patterns, and the accuracy of classified realizations was determined. C-EVA classified exposures more correctly than univariate and multivariate EVA approaches; classification accuracy was 49%, 47% and 52% for EVA (univariate and multivariate), and C-EVA, respectively (p analysis are the advantages
Information Filtering via Collaborative User Clustering Modeling
Zhang, Chu-Xu; Yu, Lu; Liu, Chuang; Liu, Hao; Yan, Xiao-Yong
2013-01-01
The past few years have witnessed the great success of recommender systems, which can significantly help users find out personalized items for them from the information era. One of the most widely applied recommendation methods is the Matrix Factorization (MF). However, most of researches on this topic have focused on mining the direct relationships between users and items. In this paper, we optimize the standard MF by integrating the user clustering regularization term. Our model considers not only the user-item rating information, but also takes into account the user interest. We compared the proposed model with three typical other methods: User-Mean (UM), Item-Mean (IM) and standard MF. Experimental results on a real-world dataset, MovieLens, show that our method performs much better than other three methods in the accuracy of recommendation.
Information filtering via collaborative user clustering modeling
Zhang, Chu-Xu; Zhang, Zi-Ke; Yu, Lu; Liu, Chuang; Liu, Hao; Yan, Xiao-Yong
2014-02-01
The past few years have witnessed the great success of recommender systems, which can significantly help users to find out personalized items for them from the information era. One of the widest applied recommendation methods is the Matrix Factorization (MF). However, most of the researches on this topic have focused on mining the direct relationships between users and items. In this paper, we optimize the standard MF by integrating the user clustering regularization term. Our model considers not only the user-item rating information but also the user information. In addition, we compared the proposed model with three typical other methods: User-Mean (UM), Item-Mean (IM) and standard MF. Experimental results on two real-world datasets, MovieLens 1M and MovieLens 100k, show that our method performs better than other three methods in the accuracy of recommendation.
Yan, Donghui; Jordan, Michael I
2011-01-01
Inspired by Random Forests (RF) in the context of classification, we propose a new clustering ensemble method---Cluster Forests (CF). Geometrically, CF randomly probes a high-dimensional data cloud to obtain "good local clusterings" and then aggregates via spectral clustering to obtain cluster assignments for the whole dataset. The search for good local clusterings is guided by a cluster quality measure $\\kappa$. CF progressively improves each local clustering in a fashion that resembles the tree growth in RF. Empirical studies on several real-world datasets under two different performance metrics show that CF compares favorably to its competitors. Theoretical analysis shows that the $\\kappa$ criterion is shown to grow each local clustering in a desirable way---it is "noise-resistant." A closed-form expression is obtained for the mis-clustering rate of spectral clustering under a perturbation model, which yields new insights into some aspects of spectral clustering.
The Productivity Analysis of Chennai Automotive Industry Cluster
Bhaskaran, E.
2014-07-01
Chennai, also called the Detroit of India, is India's second fastest growing auto market and exports auto components and vehicles to US, Germany, Japan and Brazil. For inclusive growth and sustainable development, 250 auto component industries in Ambattur, Thirumalisai and Thirumudivakkam Industrial Estates located in Chennai have adopted the Cluster Development Approach called Automotive Component Cluster. The objective is to study the Value Chain, Correlation and Data Envelopment Analysis by determining technical efficiency, peer weights, input and output slacks of 100 auto component industries in three estates. The methodology adopted is using Data Envelopment Analysis of Output Oriented Banker Charnes Cooper model by taking net worth, fixed assets, employment as inputs and gross output as outputs. The non-zero represents the weights for efficient clusters. The higher slack obtained reveals the excess net worth, fixed assets, employment and shortage in gross output. To conclude, the variables are highly correlated and the inefficient industries should increase their gross output or decrease the fixed assets or employment. Moreover for sustainable development, the cluster should strengthen infrastructure, technology, procurement, production and marketing interrelationships to decrease costs and to increase productivity and efficiency to compete in the indigenous and export market.
Pauling, L
1992-01-01
Analysis of the gamma-ray energies of 28 excited superdeformed bands of lanthanon nuclei by application of the two-revolving-cluster model yields the result that the central sphere for all 28 has the semimagic-magic composition p40n50, with the range p8n12 to p14n18 for the clusters and the radius of revolution increasing from 7.31 to 7.76 fm. Similar analysis of 28 excited bands of Hg, Tl, and Pb nuclei leads to p56n82 (semimagic-magic) for the central sphere of 24 bands, p64n82 (semimagic-magic) for 2, and p64n90 (doubly semimagic) for 2, with cluster range p8n12 to p14n16 and values of the radius of revolution from 8.70 to 8.92 fm for 26 bands and 9.2 fm for 2. PMID:11607313
Pauling, Linus
1992-08-01
Analysis of the γ-ray energies of 28 excited superdeformed bands of lanthanon nuclei by application of the two-revolving-cluster model yields the result that the central sphere for all 28 has the semimagic-magic composition p40n50, with the range p^8n12 to p14n18 for the clusters and the radius of revolution increasing from 7.31 to 7.76 fm. Similar analysis of 28 excited bands of Hg, Tl, and Pb nuclei leads to p56n82 (semimagic-magic) for the central sphere of 24 bands, p64n82 (semimagic-magic) for 2, and p64n90 (doubly semimagic) for 2, with cluster range p^8n12 to p14n16 and values of the radius of revolution from 8.70 to 8.92 fm for 26 bands and 9.2 fm for 2.
Update Legal Documents Using Hierarchical Ranking Models and Word Clustering
Pham, Minh Quang Nhat; Nguyen, Minh Le; Shimazu, Akira
2010-01-01
Our research addresses the task of updating legal documents when newinformation emerges. In this paper, we employ a hierarchical ranking model tothe task of updating legal documents. Word clustering features are incorporatedto the ranking models to exploit semantic relations between words. Experimentalresults on legal data built from the United States Code show that the hierarchicalranking model with word clustering outperforms baseline methods using VectorSpace Model, and word cluster-based ...
Towards Effective Clustering Techniques for the Analysis of Electric Power Grids
Energy Technology Data Exchange (ETDEWEB)
Hogan, Emilie A.; Cotilla Sanchez, Jose E.; Halappanavar, Mahantesh; Wang, Shaobu; Mackey, Patrick S.; Hines, Paul; Huang, Zhenyu
2013-11-30
Clustering is an important data analysis technique with numerous applications in the analysis of electric power grids. Standard clustering techniques are oblivious to the rich structural and dynamic information available for power grids. Therefore, by exploiting the inherent topological and electrical structure in the power grid data, we propose new methods for clustering with applications to model reduction, locational marginal pricing, phasor measurement unit (PMU or synchrophasor) placement, and power system protection. We focus our attention on model reduction for analysis based on time-series information from synchrophasor measurement devices, and spectral techniques for clustering. By comparing different clustering techniques on two instances of realistic power grids we show that the solutions are related and therefore one could leverage that relationship for a computational advantage. Thus, by contrasting different clustering techniques we make a case for exploiting structure inherent in the data with implications for several domains including power systems.
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Emmons, Scott; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...
Analysis and Prediction of Crimes by Clustering and Classification
Directory of Open Access Journals (Sweden)
Rasoul Kiani
2015-08-01
Full Text Available Crimes will somehow influence organizations and institutions when occurred frequently in a society. Thus, it seems necessary to study reasons, factors and relations between occurrence of different crimes and finding the most appropriate ways to control and avoid more crimes. The main objective of this paper is to classify clustered crimes based on occurrence frequency during different years. Data mining is used extensively in terms of analysis, investigation and discovery of patterns for occurrence of different crimes. We applied a theoretical model based on data mining techniques such as clustering and classification to real crime dataset recorded by police in England and Wales within 1990 to 2011. We assigned weights to the features in order to improve the quality of the model and remove low value of them. The Genetic Algorithm (GA is used for optimizing of Outlier Detection operator parameters using RapidMiner tool.
Topological Structures of Cluster Spins for Ising Models
Feng, You-gang
2010-01-01
We discussed hierarchies and rescaling rule of the self similar transformations in Ising models, and define a fractal dimension of an ordered cluster, which minimum corresponds to a fixed point of the transformations. By the fractal structures we divide the clusters into two types: irreducible and reducible. A relationship of cluster spin with its coordination number and fractal dimension is obtained.
Enhancing Text Clustering Using Concept-based Mining Model
Directory of Open Access Journals (Sweden)
Lincy Liptha R.
2012-03-01
Full Text Available Text Mining techniques are mostly based on statistical analysis of a word or phrase. The statistical analysis of a term frequency captures the importance of the term without a document only. But two terms can have the same frequency in the same document. But the meaning that one term contributes might be more appropriate than the meaning contributed by the other term. Hence, the terms that capture the semantics of the text should be given more importance. Here, a new concept-based mining is introduced. It analyses the terms based on the sentence, document and corpus level. The model consists of sentence-based concept analysis which calculates the conceptual term frequency (ctf, document-based concept analysis which finds the term frequency (tf, corpus-based concept analysis which determines the document frequency (dfand concept-based similarity measure. The process of calculating ctf, tf, df, measures in a corpus is attained by the proposed algorithm which is called Concept-Based Analysis Algorithm. By doing so we cluster the web documents in an efficient way and the quality of the clusters achieved by this model significantly surpasses the traditional single-term-base approaches.
The cosmological analysis of X-ray cluster surveys; III. Bypassing cluster mass measurements
Pierre, M; Faccioli, L; Clerc, N; Gastaud, R; Koulouridis, E; Pacaud, F
2016-01-01
Despite strong theoretical arguments, the use of clusters as cosmological probes is, in practice, frequently questioned because of the many uncertainties impinging on cluster mass estimates. Our aim is to develop a fully self-consistent cosmological approach of X-ray cluster surveys, exclusively based on observable quantities, rather than masses. This procedure is justified given the possibility to directly derive the cluster properties via ab initio modelling, either analytically or by using hydrodynamical simulations. In this third paper, we evaluate the method on cluster toy-catalogues. We model the population of detected clusters in the count-rate -- hardness-ratio -- angular size -- redshift space and compare the corresponding 4-dimensional diagram with theoretical predictions. The best cosmology+physics parameter configuration is determined using a simple minimisation procedure; errors on the parameters are derived by scanning the likelihood hyper-surfaces with a wide range of starting values. The metho...
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2017-01-18
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd.
Somatotyping using 3D anthropometry: a cluster analysis.
Olds, Tim; Daniell, Nathan; Petkov, John; David Stewart, Arthur
2013-01-01
Somatotyping is the quantification of human body shape, independent of body size. Hitherto, somatotyping (including the most popular method, the Heath-Carter system) has been based on subjective visual ratings, sometimes supported by surface anthropometry. This study used data derived from three-dimensional (3D) whole-body scans as inputs for cluster analysis to objectively derive clusters of similar body shapes. Twenty-nine dimensions normalised for body size were measured on a purposive sample of 301 adults aged 17-56 years who had been scanned using a Vitus Smart laser scanner. K-means Cluster Analysis with v-fold cross-validation was used to determine shape clusters. Three male and three female clusters emerged, and were visualised using those scans closest to the cluster centroid and a caricature defined by doubling the difference between the average scan and the cluster centroid. The male clusters were decidedly endomorphic (high fatness), ectomorphic (high linearity), and endo-mesomorphic (a mixture of fatness and muscularity). The female clusters were clearly endomorphic, ectomorphic, and the ecto-mesomorphic (a mixture of linearity and muscularity). An objective shape quantification procedure combining 3D scanning and cluster analysis yielded shape clusters strikingly similar to traditional somatotyping.
Liu, Fang; Cao, San-xing; Lu, Rui
2012-04-01
This paper proposes a user credit assessment model based on clustering ensemble aiming to solve the problem that users illegally spread pirated and pornographic media contents within the user self-service oriented broadband network new media platforms. Its idea is to do the new media user credit assessment by establishing indices system based on user credit behaviors, and the illegal users could be found according to the credit assessment results, thus to curb the bad videos and audios transmitted on the network. The user credit assessment model based on clustering ensemble proposed by this paper which integrates the advantages that swarm intelligence clustering is suitable for user credit behavior analysis and K-means clustering could eliminate the scattered users existed in the result of swarm intelligence clustering, thus to realize all the users' credit classification automatically. The model's effective verification experiments are accomplished which are based on standard credit application dataset in UCI machine learning repository, and the statistical results of a comparative experiment with a single model of swarm intelligence clustering indicates this clustering ensemble model has a stronger creditworthiness distinguishing ability, especially in the aspect of predicting to find user clusters with the best credit and worst credit, which will facilitate the operators to take incentive measures or punitive measures accurately. Besides, compared with the experimental results of Logistic regression based model under the same conditions, this clustering ensemble model is robustness and has better prediction accuracy.
A model for globular cluster extreme anomalies
D'Antona, F.; Ventura, P.
2007-08-01
In spite of the efforts made in recent years, there is still no comprehensive explanation for the chemical anomalies of globular cluster (GC) stars. Among these anomalies, the most striking is oxygen depletion, which reaches values down to [O/Fe] ~ -0.4 in most clusters, but in M13 it goes down to less than [O/Fe] ~ -1. In this work we suggest that the anomalies are due to the superposition of two different events, as follows. (i) Primordial self-enrichment; this is required to explain the oxygen depletion down to a minimum value [O/Fe] ~ -0.4. (ii) Extra mixing in a fraction of the stars already born with anomalous composition; these objects, starting with already low [O/Fe], will reduce the oxygen abundance down to the most extreme values. Contrary to other models that invoke extra mixing to explain the chemical anomalies, we suggest that this mixing is active only if there is a fraction of the stars in which the primordial composition is not only oxygen-depleted, but also extremely helium-rich (Y ~ 0.4), as found in a few GCs from their main-sequence multiplicity. We propose that the rotational evolution (and an associated extra mixing) of extremely helium-rich stars may be affected by the fact that they develop a very small or non-existent molecular weight barrier during the evolution. We show that extra mixing in these stars, having initial chemistry that has already been CNO processed, affects mainly the oxygen abundance, as well as (to a much smaller extent) the sodium abundance. The model also predicts a large fluorine depletion concomitant with the oxygen depletion, and a further enhancement of the surface helium abundance, which reaches values close to Y = 0.5 in the computed models. We stress that, in this tentative explanation, those stars that are primordially oxygen-depleted, but are not extremely helium-rich, do not suffer deep extra mixing.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Directory of Open Access Journals (Sweden)
Xin Chen
2014-01-01
Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.
Hung, Ling-Hong; Samudrala, Ram
2014-06-15
fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
PENERAPAN PENGOLAHAN PARALEL MODEL CLUSTER SEBAGAI WEB SERVER
Directory of Open Access Journals (Sweden)
Maman Somantri
2009-06-01
Full Text Available engolahan paralel merupakan suatu cara yang dilakukan untuk meningkatkan kecepatan pengolahandata dengan melakukan lebih dari satu pengolahan data tersebut secara bersamaan. Salah satu bentuk pengolahanparalel adalah model cluster. Pengolahan paralel model cluster ini akan digunakan untuk mengolah data Web,dengan membangun server Web yang di-cluster. Cluster server Web ini menggunakan teknologi Linux VirtualServer (LVS yang dapat dilakukan dengan NAT, IP tunneling, dan direct routing yang memiliki empat algoritmapenjadwalan.Pada penelitian ini akan digunakan teknologi LVS untuk membuat cluster Web Server denganmenggunakan NAT, diterapkannya teknologi Network File System, dan Network Block Device yang digunakansebagai media penyimpanan dalam jaringan. Dalam pengujian sistem cluster ini, pertama dilakukan pengujianjaringan yang digunakan untuk mengetahui kinerja sistem, dan pengujian sistem cluster dalam mengolah data Webdengan perangkat lunak WebBench dan script benchmark.
Detecting Clusters in Atom Probe Data with Gaussian Mixture Models.
Zelenty, Jennifer; Dahl, Andrew; Hyde, Jonathan; Smith, George D W; Moody, Michael P
2017-04-01
Accurately identifying and extracting clusters from atom probe tomography (APT) reconstructions is extremely challenging, yet critical to many applications. Currently, the most prevalent approach to detect clusters is the maximum separation method, a heuristic that relies heavily upon parameters manually chosen by the user. In this work, a new clustering algorithm, Gaussian mixture model Expectation Maximization Algorithm (GEMA), was developed. GEMA utilizes a Gaussian mixture model to probabilistically distinguish clusters from random fluctuations in the matrix. This machine learning approach maximizes the data likelihood via expectation maximization: given atomic positions, the algorithm learns the position, size, and width of each cluster. A key advantage of GEMA is that atoms are probabilistically assigned to clusters, thus reflecting scientifically meaningful uncertainty regarding atoms located near precipitate/matrix interfaces. GEMA outperforms the maximum separation method in cluster detection accuracy when applied to several realistically simulated data sets. Lastly, GEMA was successfully applied to real APT data.
Smartness and Italian Cities. A Cluster Analysis
Directory of Open Access Journals (Sweden)
Flavio Boscacci
2014-05-01
Full Text Available Smart cities have been recently recognized as the most pleasing and attractive places to live in; due to this, both scholars and policy-makers pay close attention to this topic. Specifically, urban “smartness” has been identified by plenty of characteristics that can be grouped into six dimensions (Giffinger et al. 2007: smart Economy (competitiveness, smart People (social and human capital, smart Governance (participation, smart Mobility (both ICTs and transport, smart Environment (natural resources, and smart Living (quality of life. According to this analytical framework, in the present paper the relation between urban attractiveness and the “smart” characteristics has been investigated in the 103 Italian NUTS3 province capitals in the year 2011. To this aim, a descriptive statistics has been followed by a regression analysis (OLS, where the dependent variable measuring the urban attractiveness has been proxied by housing market prices. Besides, a Cluster Analysis (CA has been developed in order to find differences and commonalities among the province capitals.The OLS results indicate that living, people and economy are the key drivers for achieving a better urban attractiveness. Environment, instead, keeps on playing a minor role. Besides, the CA groups the province capitals a
Instantaneous normal mode analysis of melting of finite dust clusters.
Melzer, André; Schella, André; Schablinski, Jan; Block, Dietmar; Piel, Alexander
2012-06-01
The experimental melting transition of finite two-dimensional dust clusters in a dusty plasma is analyzed using the method of instantaneous normal modes. In the experiment, dust clusters are heated in a thermodynamic equilibrium from a solid to a liquid state using a four-axis laser manipulation system. The fluid properties of the dust cluster, such as the diffusion constant, are measured from the instantaneous normal mode analysis. Thereby, the phase transition of these finite clusters is approached from the liquid phase. From the diffusion constants, unique melting temperatures have been assigned to dust clusters of various sizes that very well reflect their dynamical stability properties.
Cluster Analysis to Identify Possible Subgroups in Tinnitus Patients.
van den Berge, Minke J C; Free, Rolien H; Arnold, Rosemarie; de Kleine, Emile; Hofman, Rutger; van Dijk, J Marc C; van Dijk, Pim
2017-01-01
In tinnitus treatment, there is a tendency to shift from a "one size fits all" to a more individual, patient-tailored approach. Insight in the heterogeneity of the tinnitus spectrum might improve the management of tinnitus patients in terms of choice of treatment and identification of patients with severe mental distress. The goal of this study was to identify subgroups in a large group of tinnitus patients. Data were collected from patients with severe tinnitus complaints visiting our tertiary referral tinnitus care group at the University Medical Center Groningen. Patient-reported and physician-reported variables were collected during their visit to our clinic. Cluster analyses were used to characterize subgroups. For the selection of the right variables to enter in the cluster analysis, two approaches were used: (1) variable reduction with principle component analysis and (2) variable selection based on expert opinion. Various variables of 1,783 tinnitus patients were included in the analyses. Cluster analysis (1) included 976 patients and resulted in a four-cluster solution. The effect of external influences was the most discriminative between the groups, or clusters, of patients. The "silhouette measure" of the cluster outcome was low (0.2), indicating a "no substantial" cluster structure. Cluster analysis (2) included 761 patients and resulted in a three-cluster solution, comparable to the first analysis. Again, a "no substantial" cluster structure was found (0.2). Two cluster analyses on a large database of tinnitus patients revealed that clusters of patients are mostly formed by a different response of external influences on their disease. However, both cluster outcomes based on this dataset showed a poor stability, suggesting that our tinnitus population comprises a continuum rather than a number of clearly defined subgroups.
PERFORMANCE ANALYSIS OF CLUSTERED RADIO INTERFEROMETRIC CALIBRATION
Kazemi, S.; Yatawatta, S.; Zaroubi, S.
2012-01-01
Subtraction of compact, bright sources is essential to produce high quality images in radio astronomy. It is recently proposed that 'clustered' calibration can perform better in subtracting fainter background sources. This is due to the fact that the effective power of a source cluster is greater th
The Psychology of Yoga Practitioners: A Cluster Analysis.
Genovese, Jeremy E C; Fondran, Kristine M
2017-03-30
Yoga practitioners (N = 261) completed the revised Expression of Spirituality Inventory (ESI) and the Multidimensional Body-Self Relations Questionnaire. Cluster analysis revealed three clusters: Cluster A scored high on all four spiritual constructs. They had high positive evaluations of their appearance, but a lower orientation towards their appearance. They tended to have a high evaluation of their fitness and health, and higher body satisfaction. Cluster B showed lower scores on the spiritual constructs. Like Cluster A, members of Cluster B tended to show high positive evaluations of appearance and fitness. They also had higher body satisfaction. Members of Cluster B had a higher fitness orientation and a higher appearance orientation than members of Cluster A. Members of Cluster C had low scores for all spiritual constructs. They had a low evaluation of, and unhappiness with, their appearance. They were unhappy with the size and appearance of their bodies. They tended to see themselves as overweight. There was a significant difference in years of practice between the three groups (Kruskall-Wallis, p = .0041). Members of Cluster A have the most years of yoga experience and members of Cluster B have more yoga experience than members of Cluster C. These results suggest the possible existence of a developmental trajectory for yoga practitioners. Such a developmental sequence may have important implications for yoga practice and instruction.
Directory of Open Access Journals (Sweden)
Amreen Khan,
2010-07-01
Full Text Available Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI has recently emerged that meets these requirements and has successfully been applied to a number ofreal world clustering problems. This paper looks into the use ofParticle Swarm Optimization for cluster analysis. The effectiveness of Fuzzy C-means clustering provides enhanced performance and maintains more diversity in the swarm and also allows the particles to be robust to trace the changing environment.
Cluster analysis for DNA methylation profiles having a detection threshold
Directory of Open Access Journals (Sweden)
Siegmund Kimberly D
2006-07-01
Full Text Available Abstract Background DNA methylation, a molecular feature used to investigate tumor heterogeneity, can be measured on many genomic regions using the MethyLight technology. Due to the combination of the underlying biology of DNA methylation and the MethyLight technology, the measurements, while being generated on a continuous scale, have a large number of 0 values. This suggests that conventional clustering methodology may not perform well on this data. Results We compare performance of existing methodology (such as k-means with two novel methods that explicitly allow for the preponderance of values at 0. We also consider how the ability to successfully cluster such data depends upon the number of informative genes for which methylation is measured and the correlation structure of the methylation values for those genes. We show that when data is collected for a sufficient number of genes, our models do improve clustering performance compared to methods, such as k-means, that do not explicitly respect the supposed biological realities of the situation. Conclusion The performance of analysis methods depends upon how well the assumptions of those methods reflect the properties of the data being analyzed. Differing technologies will lead to data with differing properties, and should therefore be analyzed differently. Consequently, it is prudent to give thought to what the properties of the data are likely to be, and which analysis method might therefore be likely to best capture those properties.
Using Cluster Analysis for Data Mining in Educational Technology Research
Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.
2012-01-01
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
A Survey of Popular R Packages for Cluster Analysis
Flynt, Abby; Dean, Nema
2016-01-01
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Using Cluster Analysis for Data Mining in Educational Technology Research
Antonenko, Pavlo D.; Toy, Serkan; Niederhauser, Dale S.
2012-01-01
Cluster analysis is a group of statistical methods that has great potential for analyzing the vast amounts of web server-log data to understand student learning from hyperlinked information resources. In this methodological paper we provide an introduction to cluster analysis for educational technology researchers and illustrate its use through…
A Survey of Popular R Packages for Cluster Analysis
Flynt, Abby; Dean, Nema
2016-01-01
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Clustering Analysis for Credit Default Probabilities in a Retail Bank Portfolio
Directory of Open Access Journals (Sweden)
Elena ANDREI (DRAGOMIR
2012-08-01
Full Text Available Methods underlying cluster analysis are very useful in data analysis, especially when the processed volume of data is very large, so that it becomes impossible to extract essential information, unless specific instruments are used to summarize and structure the gross information. In this context, cluster analysis techniques are used particularly, for systematic information analysis. The aim of this article is to build an useful model for banking field, based on data mining techniques, by dividing the groups of borrowers into clusters, in order to obtain a profile of the customers (debtors and good payers. We assume that a class is appropriate if it contains members that have a high degree of similarity and the standard method for measuring the similarity within a group shows the lowest variance. After clustering, data mining techniques are implemented on the cluster with bad debtors, reaching a very high accuracy after implementation. The paper is structured as follows: Section 2 describes the model for data analysis based on a specific scoring model that we proposed. In section 3, we present a cluster analysis using K-means algorithm and the DM models are applied on a specific cluster. Section 4 shows the conclusions.
DEFF Research Database (Denmark)
Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad;
2012-01-01
) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...
Alizadeh, Bahram; Najjari, Saeid; Kadkhodaie-Ilkhchi, Ali
2012-08-01
Intelligent and statistical techniques were used to extract the hidden organic facies from well log responses in the Giant South Pars Gas Field, Persian Gulf, Iran. Kazhdomi Formation of Mid-Cretaceous and Kangan-Dalan Formations of Permo-Triassic Data were used for this purpose. Initially GR, SGR, CGR, THOR, POTA, NPHI and DT logs were applied to model the relationship between wireline logs and Total Organic Carbon (TOC) content using Artificial Neural Networks (ANN). The correlation coefficient (R2) between the measured and ANN predicted TOC equals to 89%. The performance of the model is measured by the Mean Squared Error function, which does not exceed 0.0073. Using Cluster Analysis technique and creating a binary hierarchical cluster tree the constructed TOC column of each formation was clustered into 5 organic facies according to their geochemical similarity. Later a second model with the accuracy of 84% was created by ANN to determine the specified clusters (facies) directly from well logs for quick cluster recognition in other wells of the studied field. Each created facies was correlated to its appropriate burial history curve. Hence each and every facies of a formation could be scrutinized separately and directly from its well logs, demonstrating the time and depth of oil or gas generation. Therefore potential production zone of Kazhdomi probable source rock and Kangan- Dalan reservoir formation could be identified while well logging operations (especially in LWD cases) were in progress. This could reduce uncertainty and save plenty of time and cost for oil industries and aid in the successful implementation of exploration and exploitation plans.
Whitehead, Alfred J; Vesperini, Enrico; Zwart, Simon Portegies
2013-01-01
We perform a series of simulations of evolving star clusters using AMUSE (the Astrophysical Multipurpose Software Environment), a new community-based multi-physics simulation package, and compare our results to existing work. These simulations model a star cluster beginning with a King model distribution and a selection of power-law initial mass functions, and contain a tidal cut-off. They are evolved using collisional stellar dynamics and include mass loss due to stellar evolution. After determining that the differences between AMUSE results and prior publications are understood, we explored the variation in cluster lifetimes due to the random realization noise introduced by transforming a King model to specific initial conditions. This random realization noise can affect the lifetime of a simulated star cluster by up to 30%. Two modes of star cluster dissolution were identified: a mass evolution curve that contains a run-away cluster dissolution with a sudden loss of mass, and a dissolution mode that does n...
Directory of Open Access Journals (Sweden)
Simon Benjaminsson
2010-08-01
Full Text Available Non-parametric data-driven analysis techniques can be used to study datasets with few assumptions about the data and underlying experiment. Variations of Independent Component Analysis (ICA have been the methods mostly used on fMRI data, e.g. in finding resting-state networks thought to reflect the connectivity of the brain. Here we present a novel data analysis technique and demonstrate it on resting-state fMRI data. It is a generic method with few underlying assumptions about the data. The results are built from the statistical relations between all input voxels, resulting in a whole-brain analysis on a voxel level. It has good scalability properties and the parallel implementation is capable of handling large datasets and databases. From the mutual information between the activities of the voxels over time, a distance matrix is created for all voxels in the input space. Multidimensional scaling is used to put the voxels in a lower-dimensional space reflecting the dependency relations based on the distance matrix. By performing clustering in this space we can find the strong statistical regularities in the data, which for the resting-state data turns out to be the resting-state networks. The decomposition is performed in the last step of the algorithm and is computationally simple. This opens up for rapid analysis and visualization of the data on different spatial levels, as well as automatically finding a suitable number of decomposition components.
The Evolution of Galaxy Clustering in Hierarchical Models
1999-01-01
The main ingredients of recent semi-analytic models of galaxy formation are summarised. We present predictions for the galaxy clustering properties of a well specified LCDM model whose parameters are constrained by observed local galaxy properties. We present preliminary predictions for evolution of clustering that can be probed with deep pencil beam surveys.
KMEANS CLUSTERING FOR HIDDEN MARKOV MODEL
Perrone, M.P.; Connell, S.D.
2004-01-01
An unsupervised kmeans clustering algorithm for hidden Markov models is described and applied to the task of generating subclass models for individual handwritten character classes. The algorithm is compared to a related clustering method and shown to give a relative change in the error rate of as
The quasicrystal model of cluster systems in condensed matter
Melnikov, G.
2017-01-01
The paper proposes a quasicrystal model of the structure of clusters. The model is based on the similarity of the structure of clusters and macroscopic structure of quasicrystals. It offers a formula to calculate the radii of successive coordination spheres in quasicrystalline films. The formula is based on the properties of Fibonacci sequence and characteristics of the power potential of interaction between particles.
Cluster analysis of activity-time series in motor learning
DEFF Research Database (Denmark)
Balslev, Daniela; Nielsen, Finn Å; Futiger, Sally A
2002-01-01
Neuroimaging studies of learning focus on brain areas where the activity changes as a function of time. To circumvent the difficult problem of model selection, we used a data-driven analytic tool, cluster analysis, which extracts representative temporal and spatial patterns from the voxel-time se...... practice-related activity in a fronto-parieto-cerebellar network, in agreement with previous studies of motor learning. These voxels were separated from a group of voxels showing an unspecific time-effect and another group of voxels, whose activation was an artifact from smoothing...
Cluster analysis of the hot subdwarfs in the PG survey
Thejll, Peter; Charache, Darryl; Shipman, Harry L.
1989-01-01
Application of cluster analysis to the hot subdwarfs in the Palomar Green (PG) survey of faint blue high-Galactic-latitude objects is assessed, with emphasis on data noise and the number of clusters to subdivide the data into. The data used in the study are presented, and cluster analysis, using the CLUSTAN program, is applied to it. Distances are calculated using the Euclidean formula, and clustering is done by Ward's method. The results are discussed, and five groups representing natural divisions of the subdwarfs in the PG survey are presented.
SUCCESSFUL INNOVATIVE CLUSTERS IN ROMANIA – A POSSIBLE MODEL
Directory of Open Access Journals (Sweden)
Liliana SCUTARU
2016-08-01
Full Text Available The present study proposes the construction of a successful innovative cluster model which will help creating strategies and policies to support the Romanian economic growth and development in the medium and long term. One such architecture designed for supporting innovative clusters, including by attracting foreign capital within clusters order to increase their competitiveness, addresses some concrete measures both in terms of organizational system and management strategy as well as the funding system of clusters. The paper is also emphasizing the multiplicity of factors that are contributing to the creation, to the progressive development and to the success of clusters, the activities developed and the relationships established internationally, so as to ensure that the clusters remain on the market and have a good visibility at national and international levels, essentially contributing to the success of cluster.
Comparative Studies of Clustering Techniques for Real-Time Dynamic Model Reduction
Hogan, Emilie; Halappanavar, Mahantesh; Huang, Zhenyu; Lin, Guang; Lu, Shuai; Wang, Shaobu
2015-01-01
Dynamic model reduction in power systems is necessary for improving computational efficiency. Traditional model reduction using linearized models or offline analysis would not be adequate to capture power system dynamic behaviors, especially the new mix of intermittent generation and intelligent consumption makes the power system more dynamic and non-linear. Real-time dynamic model reduction emerges as an important need. This paper explores the use of clustering techniques to analyze real-time phasor measurements to determine generator groups and representative generators for dynamic model reduction. Two clustering techniques -- graph clustering and evolutionary clustering -- are studied in this paper. Various implementations of these techniques are compared and also compared with a previously developed Singular Value Decomposition (SVD)-based dynamic model reduction approach. Various methods exhibit different levels of accuracy when comparing the reduced model simulation against the original model. But some ...
Microscopic three-cluster model of 10Be
Lashko, Yu. A.; Filippov, G. F.; Vasilevsky, V. S.
2017-02-01
We investigate spectrum of bound and resonance states in 10Be, and scattering of alpha-particles on 6He. For this aim we make use of a three-cluster microscopic model. This model incorporates Gaussian and oscillator basis functions and reduces three-cluster Schrödinger equation to a two-body like many-channel problem with the two-cluster subsystem being in a bound or a pseudo-bound state. Much attention is given to the effects of cluster polarization on spectrum of bound and resonance states in 10Be, and on elastic and inelastic 6He + α scattering.
DiStefano, Christine; Kamphaus, R. W.
2006-01-01
Two classification methods, latent class cluster analysis and cluster analysis, are used to identify groups of child behavioral adjustment underlying a sample of elementary school children aged 6 to 11 years. Behavioral rating information across 14 subscales was obtained from classroom teachers and used as input for analyses. Both the procedures…
A Bayesian cluster analysis method for single-molecule localization microscopy data.
Griffié, Juliette; Shannon, Michael; Bromley, Claire L; Boelen, Lies; Burn, Garth L; Williamson, David J; Heard, Nicholas A; Cope, Andrew P; Owen, Dylan M; Rubin-Delanchy, Patrick
2016-12-01
Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.
Two-Way Regularized Fuzzy Clustering of Multiple Correspondence Analysis.
Kim, Sunmee; Choi, Ji Yeh; Hwang, Heungsun
2017-01-01
Multiple correspondence analysis (MCA) is a useful tool for investigating the interrelationships among dummy-coded categorical variables. MCA has been combined with clustering methods to examine whether there exist heterogeneous subclusters of a population, which exhibit cluster-level heterogeneity. These combined approaches aim to classify either observations only (one-way clustering of MCA) or both observations and variable categories (two-way clustering of MCA). The latter approach is favored because its solutions are easier to interpret by providing explicitly which subgroup of observations is associated with which subset of variable categories. Nonetheless, the two-way approach has been built on hard classification that assumes observations and/or variable categories to belong to only one cluster. To relax this assumption, we propose two-way fuzzy clustering of MCA. Specifically, we combine MCA with fuzzy k-means simultaneously to classify a subgroup of observations and a subset of variable categories into a common cluster, while allowing both observations and variable categories to belong partially to multiple clusters. Importantly, we adopt regularized fuzzy k-means, thereby enabling us to decide the degree of fuzziness in cluster memberships automatically. We evaluate the performance of the proposed approach through the analysis of simulated and real data, in comparison with existing two-way clustering approaches.
Electron-gas clusters: the ultimate jellium model
Koskinen, M.; Lipas, P. O.; Manninen, M.
1995-12-01
The local spin-density approximation is used to calculate ground- and isomeric-state geometries of jellium clusters with 2 to 22 electrons. The positive background charge of the model is completely deformable, both in shape and in density. The model has no input parameters. The resulting shapes of the clusters exhibit breaking of axial and inversion symmetries; in general the shapes are far from ellipsoidal. Those clusters which lack inversion symmetry are extremely soft against odd-multipole deformations. Some clusters can be interpreted as molecules built from magic clusters. The deformation produces a gap at the Fermi level. This results in a regular odd-even staggering of the total energy per electron and of the HOMO level. The strongly deformed 14-electron cluster is semimagic. Stable isomers are predicted. The splitting of the plasmon resonance due to deformation is estimated on a classical argument.
An Extended Clustering Algorithm for Statistical Language Models
Ueberla, J P
1994-01-01
Statistical language models frequently suffer from a lack of training data. This problem can be alleviated by clustering, because it reduces the number of free parameters that need to be trained. However, clustered models have the following drawback: if there is ``enough'' data to train an unclustered model, then the clustered variant may perform worse. On currently used language modeling corpora, e.g. the Wall Street Journal corpus, how do the performances of a clustered and an unclustered model compare? While trying to address this question, we develop the following two ideas. First, to get a clustering algorithm with potentially high performance, an existing algorithm is extended to deal with higher order N-grams. Second, to make it possible to cluster large amounts of training data more efficiently, a heuristic to speed up the algorithm is presented. The resulting clustering algorithm can be used to cluster trigrams on the Wall Street Journal corpus and the language models it produces can compete with exi...
Analysis of Stemming Algorithm for Text Clustering
Directory of Open Access Journals (Sweden)
N.Sandhya
2011-09-01
Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.
Ho, Hsuan-Fu; Hung, Chia-Chi
2008-01-01
Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
Ho, Hsuan-Fu; Hung, Chia-Chi
2008-01-01
Purpose: The purpose of this paper is to examine how a graduate institute at National Chiayi University (NCYU), by using a model that integrates analytic hierarchy process, cluster analysis and correspondence analysis, can develop effective marketing strategies. Design/methodology/approach: This is primarily a quantitative study aimed at…
Fiber modeling and clustering based on neuroanatomical features.
Wang, Qian; Yap, Pew-Thian; Wu, Guorong; Shen, Dinggang
2011-01-01
DTI tractography allows unprecedented understanding of brain neural connectivity in-vivo by capturing water diffusion patterns in brain white-matter microstructures. However, tractography algorithms often output hundreds of thousands of fibers, rendering the computation needed for subsequent data analysis intractable. A remedy is to group the fibers into bundles using fiber clustering techniques. Most existing fiber clustering methods, however, rely on fiber geometrical information only by viewing fibers as curves in the 3D Euclidean space. The important neuroanatomical aspect of the fibers is mostly ignored. In this paper, neuroanatomical information is encapsulated in a feature vector called the associativity vector, which functions as the "fingerprint" for each fiber and depicts the connectivity of the fiber with respect to individual anatomies. Using the associativity vectors of fibers, we model the fibers as observations sampled from multivariate Gaussian mixtures in the feature space. An expectation-maximization clustering approach is then employed to group the fibers into 16 major bundles. Experimental results indicate that the proposed method groups the fibers into anatomically meaningful bundles, which are highly consistent across subjects.
Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann
2017-07-01
Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann
2017-03-01
Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.
Higgs Pair Production: Choosing Benchmarks With Cluster Analysis
Dall'Osso, Martino; Gottardo, Carlo A; Oliveira, Alexandra; Tosi, Mia; Goertz, Florian
2015-01-01
New physics theories often depend on a large number of free parameters. The precise values of those parameters in some cases drastically affect the resulting phenomenology of fundamental physics processes, while in others finite variations can leave it basically invariant at the level of detail experimentally accessible. When designing a strategy for the analysis of experimental data in the search for a signal predicted by a new physics model, it appears advantageous to categorize the parameter space describing the model according to the corresponding kinematical features of the final state. A multi-dimensional test statistic can be used to gauge the degree of similarity in the kinematics of different models; a clustering algorithm using that metric may then allow the division of the space into homogeneous regions, each of which can be successfully represented by a benchmark point. Searches targeting those benchmark points are then guaranteed to be sensitive to a large area of the parameter space. In this doc...
Analysis of forest fires spatial clustering using local fractal measure
Kanevski, Mikhail; Rochat, Mikael; Timonin, Vadim
2013-04-01
The research deals with an application of local fractal measure - local sandbox counting or mass counting, for the characterization of patterns of spatial clustering. The main application concerns the simulated (random patterns within validity domain in forest regions) and real data (forest fires in Ticino, Switzerland) case studies. The global patterns of spatial clustering of forest fires were extensively studied using different topological (nearest-neighbours, Voronoi polygons), statistical (Ripley's k-function, Morisita diagram) and fractal/multifractal measures (box-counting, sandbox counting, lacunarity) (Kanevski, 2008). Generalizations of these measures to functional ones can reveal the structure of the phenomena, e.g. burned areas. All these measures are valuable and complementary tools to study spatial clustering. Moreover, application of the validity domain (complex domain where phenomena is studied) concept helps in understanding and interpretation of the results. In the present paper a sandbox counting method was applied locally, i.e. each point of ignition was considered as a centre of events counting with an increasing search radius. Then, the local relationships between the radius and the number of ignition points within the given radius were examined. Finally, the results are mapped using an interpolation algorithm for the visualization and analytical purposes. Both 2d (X,Y) and 3d (X,Y,Z) cases were studied and compared. Local "fractal" study gives an interesting spatially distributed picture of clustering. The real data case study was compared with a reference homogeneous pattern - complete spatial randomness. The difference between two patterns clearly indicates the regions with important spatial clustering. An extension to the local functional measure was applied taking into account the surface of burned area, i.e. by analysing only data with the fires above some threshold of burned area. Such analysis is similar to marked point processes and
Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis
Directory of Open Access Journals (Sweden)
S. Muthurajkumar
2014-05-01
Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.
Age Estimates of Universe: from Globular Clusters to Cosmological Models and Probes
Fatima, Hira; Rahman, Syed Faisal Ur
2016-01-01
We performed the photometric analysis of M2 and M92 globular clusters in g and r bands of SLOAN photometric system. We transformed these g and r bands into BV bands of Johnson-Cousins photometric system and built the color magnitude diagram (CMD). We estimated the age, and metallicity of both the clusters, by fitting Padova isochrones of different age and metallicities onto the CMD. We studied Einstein and de Sitter model, bench mark model, the cosmological parameters by WMAP and Planck surveys. Finally, we compared estimated age of globular clusters to the ages from the cosmological models and cosmological parameters values of WMAP and Planck surveys.
Alpha-cluster model of atomic nuclei
Energy Technology Data Exchange (ETDEWEB)
Sosin, Zbigniew; Kallunkathariyil, Jinesh [Jagiellonian University, M. Smoluchowski Institute of Physics, Krakow (Poland); Blocki, Jan [NCBJ, Theoretical Physics Division (BP2), Swierk (Poland); Lukasik, Jerzy; Pawlowski, Piotr [IFJ PAN, Krakow (Poland)
2016-05-15
The description of a nuclear system in its ground state and at low excitations based on the equation of state (EoS) around normal density is presented. In the expansion of the EoS around the saturation point, additional spin polarization terms are taken into account. These terms, together with the standard symmetry term, are responsible for the appearance of the α-like clusters in the ground-state configurations of the N=Z even-even nuclei. At the nuclear surface these clusters can be identified as alpha particles. A correction for the surface effects is introduced for atomic nuclei. Taking into account an additional interaction between clusters the binding energies and sizes of the considered nuclei are very accurately described. The limits of the EoS parameters are established from the properties of the α, {sup 3}He and t particles. (orig.)
On scaling properties of cluster distributions in Ising models
Ruge, C.; Wagner, F.
1992-01-01
Scaling relations of cluster distributions for the Wolff algorithm are derived. We found them to be well satisfied for the Ising model in d=3 dimensions. Using scaling and a parametrization of the cluster distribution, we determine the critical exponent β/ν=0.516(6) with moderate effort in computing time.
Ab initio calculations and modelling of atomic cluster structure
DEFF Research Database (Denmark)
Solov'yov, Ilia; Lyalin, Andrey G.; Greiner, Walter
2004-01-01
framework for modelling the fusion process of noble gas clusters is presented. We report the striking correspondence of the peaks in the experimentally measured abundance mass spectra with the peaks in the size-dependence of the second derivative of the binding energy per atom calculated for the chain...... of the noble gas clusters up to 150 atoms....
Fitting Latent Cluster Models for Networks with latentnet
Directory of Open Access Journals (Sweden)
Pavel N. Krivitsky
2007-12-01
Full Text Available latentnet is a package to fit and evaluate statistical latent position and cluster models for networks. Hoﬀ, Raftery, and Handcock (2002 suggested an approach to modeling networks based on positing the existence of an latent space of characteristics of the actors. Relationships form as a function of distances between these characteristics as well as functions of observed dyadic level covariates. In latentnet social distances are represented in a Euclidean space. It also includes a variant of the extension of the latent position model to allow for clustering of the positions developed in Handcock, Raftery, and Tantrum (2007.The package implements Bayesian inference for the models based on an Markov chain Monte Carlo algorithm. It can also compute maximum likelihood estimates for the latent position model and a two-stage maximum likelihood method for the latent position cluster model. For latent position cluster models, the package provides a Bayesian way of assessing how many groups there are, and thus whether or not there is any clustering (since if the preferred number of groups is 1, there is little evidence for clustering. It also estimates which cluster each actor belongs to. These estimates are probabilistic, and provide the probability of each actor belonging to each cluster. It computes four types of point estimates for the coefficients and positions: maximum likelihood estimate, posterior mean, posterior mode and the estimator which minimizes Kullback-Leibler divergence from the posterior. You can assess the goodness-of-fit of the model via posterior predictive checks. It has a function to simulate networks from a latent position or latent position cluster model.
Modelling Catalyst Surfaces Using DFT Cluster Calculations
Directory of Open Access Journals (Sweden)
Oliver Kröcher
2009-09-01
Full Text Available We review our recent theoretical DFT cluster studies of a variety of industrially relevant catalysts such as TiO2, γ-Al2O3, V2O5-WO3-TiO2 and Ni/Al2O3. Aspects of the metal oxide surface structure and the stability and structure of metal clusters on the support are discussed as well as the reactivity of surfaces, including their behaviour upon poisoning. It is exemplarily demonstrated how such theoretical considerations can be combined with DRIFT and XPS results from experimental studies.
Modelling catalyst surfaces using DFT cluster calculations.
Czekaj, Izabela; Wambach, Jörg; Kröcher, Oliver
2009-11-20
We review our recent theoretical DFT cluster studies of a variety of industrially relevant catalysts such as TiO(2), gamma-Al(2)O(3), V(2)O(5)-WO(3)-TiO(2) and Ni/Al(2)O(3). Aspects of the metal oxide surface structure and the stability and structure of metal clusters on the support are discussed as well as the reactivity of surfaces, including their behaviour upon poisoning. It is exemplarily demonstrated how such theoretical considerations can be combined with DRIFT and XPS results from experimental studies.
Hierarchical Cluster Analysis – Various Approaches to Data Preparation
Directory of Open Access Journals (Sweden)
Z. Pacáková
2013-09-01
Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.
Structures and components in galaxy clusters: observations and models
Bykov, A M; Ferrari, C; Forman, W R; Kaastra, J S; Klein, U; Markevitch, M; de Plaa, J
2015-01-01
Clusters of galaxies are the largest gravitationally bounded structures in the Universe dominated by dark matter. We review the observational appearance and physical models of plasma structures in clusters of galaxies. Bubbles of relativistic plasma which are inflated by supermassive black holes of AGNs, cooling and heating of the gas, large scale plasma shocks, cold fronts, non-thermal halos and relics are observed in clusters. These constituents are reflecting both the formation history and the dynamical properties of clusters of galaxies. We discuss X-ray spectroscopy as a tool to study the metal enrichment in clusters and fine spectroscopy of Fe X-ray lines as a powerful diagnostics of both the turbulent plasma motions and the energetics of the non-thermal electron populations. The knowledge of the complex dynamical and feedback processes is necessary to understand the energy and matter balance as well as to constrain the role of the non-thermal components of clusters.
Mock Observations of Blue Stragglers in Globular Cluster Models
Sills, Alison; Chatterjee, Sourav; Rasio, Frederic A
2013-01-01
We created artificial color-magnitude diagrams of Monte Carlo dynamical models of globular clusters, and then used observational methods to determine the number of blue stragglers in those clusters. We compared these blue stragglers to various cluster properties, mimicking work that has been done for blue stragglers in Milky Way globular clusters to determine the dominant formation mechanism(s) of this unusual stellar population. We find that a mass-based prescription for selecting blue stragglers will choose approximately twice as many blue stragglers than a selection criterion that was developed for observations of real clusters. However, the two numbers of blue stragglers are well-correlated, so either selection criterion can be used to characterize the blue straggler population of a cluster. We confirm previous results that the simplified prescription for the evolution of a collision or merger product in the BSE code overestimates the lifetime of collision products. Because our observationally-motivated s...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
Cluster-based reduced-order modelling of a mixing layer
Kaiser, Eurika; Cordier, Laurent; Spohn, Andreas; Segond, Marc; Abel, Markus; Daviller, Guillaume; Niven, Robert K
2013-01-01
We propose a novel cluster-based reduced-order modelling (CROM) strategy of unsteady flows. CROM builds on the pioneering works of Gunzburger's group in cluster analysis (Burkardt et al. 2006) and Eckhardt's group in transition matrix models (Schneider et al. 2007) and constitutes a potential alternative to POD models. This strategy processes a time-resolved sequence of flow snapshots in two steps. First, the snapshot data is clustered into a small number of representative states, called centroids, in the state space. These centroids partition the state space in complementary non-overlapping regions (centroidal Voronoi cells). Departing from the standard algorithm, the probability of the clusters are determined, and the states are sorted by transition matrix consideration. Secondly, the transitions between the states are dynamically modelled via a Markov process. Physical mechanisms are then distilled by a refined analysis of the Markov process, e.g. with the finite-time Lyapunov exponent and entropic methods...
Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials
Sanders, Elizabeth A.
2011-01-01
This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…
Beasley, M A; Strader, J; Forbes, D A; Proctor, R N; Barmby, P; Huchra, J P; Beasley, Michael A.; Brodie, Jean P.; Strader, Jay; Forbes, Duncan A.; Proctor, Robert N.; Barmby, Pauline; Huchra, John P.
2004-01-01
We derive ages, metallicities and [alpha/Fe] ratios from the integrated spectra of 23 globular clusters in M31, by employing multivariate fits to two stellar population models. In parallel we analyze spectra of 21 Galactic globular clusters in order to facilitate a differential analysis. We find that the M31 globular clusters separate into three distinct components in age and metallicity. We identify an old, metal-poor group (7 clusters), an old, metal-rich group (10 clusters) and an intermediate age (3-6 Gyr), intermediate-metallicity ([Z/H]~-1) group (6 clusters). This third group is not identified in the Galactic globular cluster sample. The majority of globular clusters in both samples appear to be enhanced in alpha-elements, the degree of enhancement being model-dependent. The intermediate age GCs appear to be the most enhanced, with [alpha/Fe]~0.4. These clusters are clearly depressed in CN with respect to the models and the bulk of the M31 and Milky Way sample. Compared to the bulge of M31, M32 and NGC...
Entropic Approach to Multiscale Clustering Analysis
Directory of Open Access Journals (Sweden)
Antonio Insolia
2012-05-01
Full Text Available Recently, a novel method has been introduced to estimate the statistical significance of clustering in the direction distribution of objects. The method involves a multiscale procedure, based on the Kullback–Leibler divergence and the Gumbel statistics of extreme values, providing high discrimination power, even in presence of strong background isotropic contamination. It is shown that the method is: (i semi-analytical, drastically reducing computation time; (ii very sensitive to small, medium and large scale clustering; (iii not biased against the null hypothesis. Applications to the physics of ultra-high energy cosmic rays, as a cosmological probe, are presented and discussed.
ICA Model Order Estimation Using Clustering Method
Directory of Open Access Journals (Sweden)
P. Sovka
2007-12-01
Full Text Available In this paper a novel approach for independent component analysis (ICA model order estimation of movement electroencephalogram (EEG signals is described. The application is targeted to the brain-computer interface (BCI EEG preprocessing. The previous work has shown that it is possible to decompose EEG into movement-related and non-movement-related independent components (ICs. The selection of only movement related ICs might lead to BCI EEG classification score increasing. The real number of the independent sources in the brain is an important parameter of the preprocessing step. Previously, we used principal component analysis (PCA for estimation of the number of the independent sources. However, PCA estimates only the number of uncorrelated and not independent components ignoring the higher-order signal statistics. In this work, we use another approach - selection of highly correlated ICs from several ICA runs. The ICA model order estimation is done at significance level ÃŽÂ± = 0.05 and the model order is less or more dependent on ICA algorithm and its parameters.
Bárcena, Javier F.; García-Alba, Javier; García, Andrés; Álvarez, César
2016-11-01
A methodology to determine the spatial and temporal evolution of stratification in estuaries driven by astronomical tides and river discharges was developed and is presented here. Using a 3D hydrodynamic model, the variation of estuarine currents, water levels and densities was investigated under different realistic forcing conditions. These conditions were classified from a long-term period (>30 years) of river flows and tidal water levels by a K-means clustering approach suggested by Bárcena et al. (2015). The methodology allows computing the location of mixed, partially mixed/stratified and stratified areas in tidal river estuaries along a continuum by means of Richardson's Layer number and the frequency of every model scenario. In order to illustrate the power of the method, it was applied to a case study, the Suances Estuary. In the application case, the Suances Estuary was vertically mixed at its innermost part due to riverine influence. At the outer part, it was also vertically mixed due to the turbulence caused by tidal action. At the intermediate section, it was partially mixed in the main channel or stratified in intertidal areas due to the combined action of forcing, depth gradients between the main channel and intertidal areas, and salinity variations in the water column.
Analysing star cluster populations with stochastic models: the HST/WFC3 sample of clusters in M83
Fouesneau, Morgan; Chandar, Rupali; Whitmore, Bradley C
2012-01-01
The majority of clusters in the Universe have masses well below 10^5 Msun. Hence their integrated fluxes and colors can be affected by the random presence of a few bright stars introduced by stochastic sampling of the stellar mass function. Specific methods are being developed to extend the analysis of cluster SEDs into the low-mass regime. In this paper, we apply such a method to observations of star clusters, in the nearby spiral galaxy M83. We reassess ages and masses of a sample of 1242 objects for which UBVIHalpha fluxes were obtained with the HST/WFC3 images. Synthetic clusters with known properties are used to characterize the limitations of the method. The ensemble of color predictions of the discrete cluster models are in good agreement with the distribution of observed colors. We emphasize the important role of the Halpha data in the assessment of the fraction of young objects, particularly in breaking the age-extinction degeneracy that hampers an analysis based on UBVI only. We find the mass distri...
Obstructive Sleep Apnea: A Cluster Analysis at Time of Diagnosis
Grillet, Yves; Richard, Philippe; Stach, Bruno; Vivodtzev, Isabelle; Timsit, Jean-Francois; Lévy, Patrick; Tamisier, Renaud; Pépin, Jean-Louis
2016-01-01
Background The classification of obstructive sleep apnea is on the basis of sleep study criteria that may not adequately capture disease heterogeneity. Improved phenotyping may improve prognosis prediction and help select therapeutic strategies. Objectives: This study used cluster analysis to investigate the clinical clusters of obstructive sleep apnea. Methods An ascending hierarchical cluster analysis was performed on baseline symptoms, physical examination, risk factor exposure and co-morbidities from 18,263 participants in the OSFP (French national registry of sleep apnea). The probability for criteria to be associated with a given cluster was assessed using odds ratios, determined by univariate logistic regression. Results: Six clusters were identified, in which patients varied considerably in age, sex, symptoms, obesity, co-morbidities and environmental risk factors. The main significant differences between clusters were minimally symptomatic versus sleepy obstructive sleep apnea patients, lean versus obese, and among obese patients different combinations of co-morbidities and environmental risk factors. Conclusions Our cluster analysis identified six distinct clusters of obstructive sleep apnea. Our findings underscore the high degree of heterogeneity that exists within obstructive sleep apnea patients regarding clinical presentation, risk factors and consequences. This may help in both research and clinical practice for validating new prevention programs, in diagnosis and in decisions regarding therapeutic strategies. PMID:27314230
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma.
Gardiner, Stuart K; Mansberger, Steven L; Demirel, Shaban
2017-05-01
Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. A total of 133 test-retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis ("MD worsening faster than x dB/y with P cluster analyses ("n locations [or clusters] worsening faster than x dB/y with P cluster analysis criterion, and 4.1 years (95% CI, 4.0-4.5) for the best pointwise criterion. However, for pointwise analysis, only 38% of these changes were confirmed, compared with 61% for clusters and 76% for MD. The time until 25% of eyes showed subsequently confirmed deterioration was 6.3 years (95% CI, 6.0-7.2) for global, 6.3 years (95% CI, 6.0-7.0) for pointwise, and 6.0 years (95% CI, 5.3-6.6) for cluster analyses. Although the specificity is still suboptimal, cluster trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses.
[On National Demonstration Areas: a cluster analysis].
Mao, F; Jiang, Y Y; Dong, W L; Ji, N; Dong, J Q
2017-04-10
Objective: To understand the 'backward' provinces and the relatively poor work among the construction of National Demonstration Area, so as to promote communication and future visions among different regions. Methods: Methods on Cluster analysis were used to compare the development of National Demonstration Area in different provinces, including the coverage of National Demonstration Area and the scores of non-communicable disease (NCDs) prevention and control work based on a standardized indicating system. Results: According to the results from the construction of National Demonstration Area, all the 29 provinces and the Xinjiang Production and Construction Corps (except Tibet and Qinghai) were classified into 6 categories: Shanghai; Beijing, Zhejiang, Chongqing; Tianjin, Shandong, Guangdong and Xinjiang Production and Construction Corps; Hebei, Fujian, Hubei, Jiangsu, Liaoning, Xinjiang, Hunan and Guangxi; Shanxi, Jilin, Henan, Hainan,Sichuan, Anhui and Jiangxi; Inner Mongolia, Shaanxi, Ningxia, Guizhou, Yunnan, Gansu and Heilongjiang. Based on the scores gathered from this study, 24 items that representing the achievements from the NCDs prevention and control endeavor were classified into 4 categories: Manpower, special day on NCD, information materials development, policy/strategy support, financial support, mass media, enabled environment, community fitness campaign, health promotion for children and teenage, institutional structure and patient self-management; healthy diet, risk factors on NCDs surveillance, tobacco control and community diagnosis; intervention of high-risk groups, identification of high-risk groups, reporting system on cardiovascular and cerebrovascular events, popularization of basic public health service, workplace intervention programs, construction of demonstration units and mortality surveillance; oral hygiene and tumor registration. Contents including oral hygiene, tumor registration, intervention on high-risk groups, identification of
Model-based clustering in networks with Stochastic Community Finding
McDaid, Aaron F; Friel, Nial; Hurley, Neil J
2012-01-01
In the model-based clustering of networks, blockmodelling may be used to identify roles in the network. We identify a special case of the Stochastic Block Model (SBM) where we constrain the cluster-cluster interactions such that the density inside the clusters of nodes is expected to be greater than the density between clusters. This corresponds to the intuition behind community-finding methods, where nodes tend to clustered together if they link to each other. We call this model Stochastic Community Finding (SCF) and present an efficient MCMC algorithm which can cluster the nodes, given the network. The algorithm is evaluated on synthetic data and is applied to a social network of interactions at a karate club and at a monastery, demonstrating how the SCF finds the 'ground truth' clustering where sometimes the SBM does not. The SCF is only one possible form of constraint or specialization that may be applied to the SBM. In a more supervised context, it may be appropriate to use other specializations to guide...
MASSCLEAN - MASSive CLuster Evolution and ANalysis Package - Description and Tests
Popescu, Bogdan
2008-01-01
We present MASSCLEAN, a new, sophisticated and robust stellar cluster image and photometry simulation package. This package is able to create color-magnitude diagrams and standard FITS images in any of the traditional optical and near-infrared bands based on cluster characteristics input by the user, including but not limited to distance, age, mass, radius and extinction. At the limit of very distant, unresolved clusters, we have checked the integrated colors created in MASSCLEAN against those from other single stellar population models with consistent results. We have also tested models which provide a reasonable estimate of the field star contamination in images and color-magnitude diagrams. We demonstrate the package by simulating images and color-magnitude diagrams of well known massive Milky Way clusters and compare their appearance to real data. Because the algorithm populates the cluster with a discrete number of tenable stars, it can be used as part of a Monte Carlo Method to derive the probabilistic ...
Directory of Open Access Journals (Sweden)
Xiao-Juan Jiang
Full Text Available BACKGROUND: The vertebrate protocadherins are a subfamily of cell adhesion molecules that are predominantly expressed in the nervous system and are believed to play an important role in establishing the complex neural network during animal development. Genes encoding these molecules are organized into a cluster in the genome. Comparative analysis of the protocadherin subcluster organization and gene arrangements in different vertebrates has provided interesting insights into the history of vertebrate genome evolution. Among tetrapods, protocadherin clusters have been fully characterized only in mammals. In this study, we report the identification and comparative analysis of the protocadherin cluster in a reptile, the green anole lizard (Anolis carolinensis. METHODOLOGY/PRINCIPAL FINDINGS: We show that the anole protocadherin cluster spans over a megabase and encodes a total of 71 genes. The number of genes in the anole protocadherin cluster is significantly higher than that in the coelacanth (49 genes and mammalian (54-59 genes clusters. The anole protocadherin genes are organized into four subclusters: the delta, alpha, beta and gamma. This subcluster organization is identical to that of the coelacanth protocadherin cluster, but differs from the mammalian clusters which lack the delta subcluster. The gene number expansion in the anole protocadherin cluster is largely due to the extensive gene duplication in the gammab subgroup. Similar to coelacanth and elephant shark protocadherin genes, the anole protocadherin genes have experienced a low frequency of gene conversion. CONCLUSIONS/SIGNIFICANCE: Our results suggest that similar to the protocadherin clusters in other vertebrates, the evolution of anole protocadherin cluster is driven mainly by lineage-specific gene duplications and degeneration. Our analysis also shows that loss of the protocadherin delta subcluster in the mammalian lineage occurred after the divergence of mammals and reptiles
Users matter : multi-agent systems model of high performance computing cluster users.
Energy Technology Data Exchange (ETDEWEB)
North, M. J.; Hood, C. S.; Decision and Information Sciences; IIT
2005-01-01
High performance computing clusters have been a critical resource for computational science for over a decade and have more recently become integral to large-scale industrial analysis. Despite their well-specified components, the aggregate behavior of clusters is poorly understood. The difficulties arise from complicated interactions between cluster components during operation. These interactions have been studied by many researchers, some of whom have identified the need for holistic multi-scale modeling that simultaneously includes network level, operating system level, process level, and user level behaviors. Each of these levels presents its own modeling challenges, but the user level is the most complex due to the adaptability of human beings. In this vein, there are several major user modeling goals, namely descriptive modeling, predictive modeling and automated weakness discovery. This study shows how multi-agent techniques were used to simulate a large-scale computing cluster at each of these levels.
Visual verification and analysis of cluster detection for molecular dynamics.
Grottel, Sebastian; Reina, Guido; Vrabec, Jadran; Ertl, Thomas
2007-01-01
A current research topic in molecular thermodynamics is the condensation of vapor to liquid and the investigation of this process at the molecular level. Condensation is found in many physical phenomena, e.g. the formation of atmospheric clouds or the processes inside steam turbines, where a detailed knowledge of the dynamics of condensation processes will help to optimize energy efficiency and avoid problems with droplets of macroscopic size. The key properties of these processes are the nucleation rate and the critical cluster size. For the calculation of these properties it is essential to make use of a meaningful definition of molecular clusters, which currently is a not completely resolved issue. In this paper a framework capable of interactively visualizing molecular datasets of such nucleation simulations is presented, with an emphasis on the detected molecular clusters. To check the quality of the results of the cluster detection, our framework introduces the concept of flow groups to highlight potential cluster evolution over time which is not detected by the employed algorithm. To confirm the findings of the visual analysis, we coupled the rendering view with a schematic view of the clusters' evolution. This allows to rapidly assess the quality of the molecular cluster detection algorithm and to identify locations in the simulation data in space as well as in time where the cluster detection fails. Thus, thermodynamics researchers can eliminate weaknesses in their cluster detection algorithms. Several examples for the effective and efficient usage of our tool are presented.
A Flocking Based algorithm for Document Clustering Analysis
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL
2006-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.
DEFF Research Database (Denmark)
Dashab, Golam Reza; Kadri, Naveen Kumar; Mahdi Shariati, Mohammad
2012-01-01
Background: Despite many success stories of genome wide association studies (GWAS), challenges exist in QTL detection especially in datasets with many levels of relatedness. In this study we compared four methods of GWA on a dataset simulated for the 15th QTL-MAS workshop. The four methods were 1......) Mixed model analysis (MMA), 2) Random haplotype model (RHM), 3) Genealogy-based mixed model (GENMIX), and 4) Bayesian variable selection (BVS). The data consisted of phenotypes of 2000 animals from 20 sire families and were genotyped with 9990 SNPs on five chromosomes. Results: Out of the eight...
Reliability analysis of cluster-based ad-hoc networks
Energy Technology Data Exchange (ETDEWEB)
Cook, Jason L. [Quality Engineering and System Assurance, Armament Research Development Engineering Center, Picatinny Arsenal, NJ (United States); Ramirez-Marquez, Jose Emmanuel [School of Systems and Enterprises, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030 (United States)], E-mail: Jose.Ramirez-Marquez@stevens.edu
2008-10-15
The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks.
Time series clustering analysis of health-promoting behavior
Yang, Chi-Ta; Hung, Yu-Shiang; Deng, Guang-Feng
2013-10-01
Health promotion must be emphasized to achieve the World Health Organization goal of health for all. Since the global population is aging rapidly, ComCare elder health-promoting service was developed by the Taiwan Institute for Information Industry in 2011. Based on the Pender health promotion model, ComCare service offers five categories of health-promoting functions to address the everyday needs of seniors: nutrition management, social support, exercise management, health responsibility, stress management. To assess the overall ComCare service and to improve understanding of the health-promoting behavior of elders, this study analyzed health-promoting behavioral data automatically collected by the ComCare monitoring system. In the 30638 session records collected for 249 elders from January, 2012 to March, 2013, behavior patterns were identified by fuzzy c-mean time series clustering algorithm combined with autocorrelation-based representation schemes. The analysis showed that time series data for elder health-promoting behavior can be classified into four different clusters. Each type reveals different health-promoting needs, frequencies, function numbers and behaviors. The data analysis result can assist policymakers, health-care providers, and experts in medicine, public health, nursing and psychology and has been provided to Taiwan National Health Insurance Administration to assess the elder health-promoting behavior.
Differences in Pedaling Technique in Cycling: A Cluster Analysis.
Lanferdini, Fábio J; Bini, Rodrigo R; Figueiredo, Pedro; Diefenthaeler, Fernando; Mota, Carlos B; Arndt, Anton; Vaz, Marco A
2016-10-01
To employ cluster analysis to assess if cyclists would opt for different strategies in terms of neuromuscular patterns when pedaling at the power output of their second ventilatory threshold (POVT2) compared with cycling at their maximal power output (POMAX). Twenty athletes performed an incremental cycling test to determine their power output (POMAX and POVT2; first session), and pedal forces, muscle activation, muscle-tendon unit length, and vastus lateralis architecture (fascicle length, pennation angle, and muscle thickness) were recorded (second session) in POMAX and POVT2. Athletes were assigned to 2 clusters based on the behavior of outcome variables at POVT2 and POMAX using cluster analysis. Clusters 1 (n = 14) and 2 (n = 6) showed similar power output and oxygen uptake. Cluster 1 presented larger increases in pedal force and knee power than cluster 2, without differences for the index of effectiveness. Cluster 1 presented less variation in knee angle, muscle-tendon unit length, pennation angle, and tendon length than cluster 2. However, clusters 1 and 2 showed similar muscle thickness, fascicle length, and muscle activation. When cycling at POVT2 vs POMAX, cyclists could opt for keeping a constant knee power and pedal-force production, associated with an increase in tendon excursion and a constant fascicle length. Increases in power output lead to greater variations in knee angle, muscle-tendon unit length, tendon length, and pennation angle of vastus lateralis for a similar knee-extensor activation and smaller pedal-force changes in cyclists from cluster 2 than in cluster 1.
Morgan, Katy E; Forbes, Andrew B; Keogh, Ruth H; Jairath, Vipul; Kahan, Brennan C
2017-01-30
In cluster randomised cross-over (CRXO) trials, clusters receive multiple treatments in a randomised sequence over time. In such trials, there is usual correlation between patients in the same cluster. In addition, within a cluster, patients in the same period may be more similar to each other than to patients in other periods. We demonstrate that it is necessary to account for these correlations in the analysis to obtain correct Type I error rates. We then use simulation to compare different methods of analysing a binary outcome from a two-period CRXO design. Our simulations demonstrated that hierarchical models without random effects for period-within-cluster, which do not account for any extra within-period correlation, performed poorly with greatly inflated Type I errors in many scenarios. In scenarios where extra within-period correlation was present, a hierarchical model with random effects for cluster and period-within-cluster only had correct Type I errors when there were large numbers of clusters; with small numbers of clusters, the error rate was inflated. We also found that generalised estimating equations did not give correct error rates in any scenarios considered. An unweighted cluster-level summary regression performed best overall, maintaining an error rate close to 5% for all scenarios, although it lost power when extra within-period correlation was present, especially for small numbers of clusters. Results from our simulation study show that it is important to model both levels of clustering in CRXO trials, and that any extra within-period correlation should be accounted for. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
X-Ray Morphological Analysis of the Planck ESZ Clusters
Lovisari, Lorenzo; Forman, William R.; Jones, Christine; Ettori, Stefano; Andrade-Santos, Felipe; Arnaud, Monique; Démoclès, Jessica; Pratt, Gabriel W.; Randall, Scott; Kraft, Ralph
2017-09-01
X-ray observations show that galaxy clusters have a very large range of morphologies. The most disturbed systems, which are good to study how clusters form and grow and to test physical models, may potentially complicate cosmological studies because the cluster mass determination becomes more challenging. Thus, we need to understand the cluster properties of our samples to reduce possible biases. This is complicated by the fact that different experiments may detect different cluster populations. For example, Sunyaev–Zeldovich (SZ) selected cluster samples have been found to include a greater fraction of disturbed systems than X-ray selected samples. In this paper we determine eight morphological parameters for the Planck Early Sunyaev–Zeldovich (ESZ) objects observed with XMM-Newton. We found that two parameters, concentration and centroid shift, are the best to distinguish between relaxed and disturbed systems. For each parameter we provide the values that allow selecting the most relaxed or most disturbed objects from a sample. We found that there is no mass dependence on the cluster dynamical state. By comparing our results with what was obtained with REXCESS clusters, we also confirm that the ESZ clusters indeed tend to be more disturbed, as found by previous studies.
应用因子分析和K-MEANS聚类的客户分群建模%Customer Segmentation Modeling on Factor Analysis and K-MEANS Clustering
Institute of Scientific and Technical Information of China (English)
彭凯; 秦永彬; 许道云
2011-01-01
为挖掘存量用户的潜在数据业务使用需求,研究客户细分成为各电信运营商进行差异化营销所必须解决的问题.利用聚类算法提出了一种解决电信短信业务客户分群的应用模型.首先基于因子分析为复杂参数变量下的数据挖掘有效地减少了冗余字段,提高了模型构建的质量和效率,然后通过无监督的K-MEANS分群算法完成分群.经验证,该短信分群模型具备明显的特征差异性.2009年某西部通信企业应用该模型在数据业务差异化营销中取得了明显的效益.%To develop customers' potential demands for data services, the research for customer segmentation has become a primitive work of telecommunications operators in order to run a differentiated users' marketing. Through the use of clustering algorithm, this paper presented a segmentation modeling for differentiating customers using short messaging services in telecommunications operators. Firstly, based on factor analysis, redundant properties were simplified in the complex data mining under variable parameters in order to improve the quality and efficiency of the modeling, and then the customer segmentation model was constructed through unsupervised clustering K-MEANS algorithm. It was verified that the SMS users have the obvious differentiation of characteristics by using the cluster model. In 2009, a western communications enterprise achieved significant benefits with application of the model in the differentiated data service marketing.
Old star clusters: Bench tests of low mass stellar models
Directory of Open Access Journals (Sweden)
Salaris M.
2013-03-01
Full Text Available Old star clusters in the Milky Way and external galaxies have been (and still are traditionally used to constrain the age of the universe and the timescales of galaxy formation. A parallel avenue of old star cluster research considers these objects as bench tests of low-mass stellar models. This short review will highlight some recent tests of stellar evolution models that make use of photometric and spectroscopic observations of resolved old star clusters. In some cases these tests have pointed to additional physical processes efficient in low-mass stars, that are not routinely included in model computations. Moreover, recent results from the Kepler mission about the old open cluster NGC6791 are adding new tight constraints to the models.
A Collaboration Service Model for a Global Port Cluster
National Research Council Canada - National Science Library
Toh, Keith K.T; Welsh, Karyn; Hassall, Kim
2010-01-01
... between business entities within the cluster. The maturity of technologies providing portals, web and middleware services provides an opportunity to push the boundaries of contemporary service reference models and service catalogues to what...
Some Abnormal Properties of Water in the Cluster Model
Directory of Open Access Journals (Sweden)
G.A. Melnikov
2013-12-01
Full Text Available In the framework of the cluster model developed by the structure of liquids for the anomalous dependences of the speed of sound and thermal conductivity of water temperature along the liquid-vapor equilibrium are explained.
Cluster-size dependent randomization traffic flow model
Institute of Scientific and Technical Information of China (English)
Gao Kun; Wang Bing-Hong; Fu Chuan-Ji; Lu Yu-Feng
2007-01-01
In order to exhibit the meta-stable states, several slow-to-start rules have been investigated as modification to Nagel-Schreckenberg (NS) model. These models can reproduce some realistic phenomena which are absent in the original NS model. But in these models, the size of cluster is still not considered as a useful parameter. In real traffic,the slow-to-start motion of a standing vehicle often depends on the degree of congestion which can be measured by the clusters'size. According to this idea, we propose a cluster-size dependent slow-to-start model based on the speeddependent slow-to-start rule (VDR) model. It gives expected results through simulations. Comparing with the VDR model, our new model has a better traffic efficiency and shows richer complex characters.
Coordination game model of co-opetition relationship on cluster supply chains
Institute of Scientific and Technical Information of China (English)
Zhou Min; Deng Feiqi; Wu Sai
2008-01-01
The research of cluster supply chains is a new direction and a hotspot of the industrial cluster theory. On the condition of the coordination game, the enterprises may be stuck on the non-efficient equilibrium status, which becomes an important problem that must be considered on cluster supply chains. A symmetrical coordination game model is constituted to describe the competition and cooperation relationship of the same-quality manufacturers on cluster supply chains. The methods of the non-cooperation game theory and the evolutionary game theory are respectively used to analyze the model, whose parameters' influences under each method are then compared. It can be concluded that the analysis of the evolutionary game theory is more realistic and practical. Finally, three approaches are considered to break away from being path-dependence locked-in non-efficient status during this coordination game evolutionary process, which provide the development of cluster supply chains with an effective forecasting and Pareto optimizing method.
Patterns of Brucellosis Infection Symptoms in Azerbaijan: A Latent Class Cluster Analysis
Directory of Open Access Journals (Sweden)
Rita Ismayilova
2014-01-01
Full Text Available Brucellosis infection is a multisystem disease, with a broad spectrum of symptoms. We investigated the existence of clusters of infected patients according to their clinical presentation. Using national surveillance data from the Electronic-Integrated Disease Surveillance System, we applied a latent class cluster (LCC analysis on symptoms to determine clusters of brucellosis cases. A total of 454 cases reported between July 2011 and July 2013 were analyzed. LCC identified a two-cluster model and the Vuong-Lo-Mendell-Rubin likelihood ratio supported the cluster model. Brucellosis cases in the second cluster (19% reported higher percentages of poly-lymphadenopathy, hepatomegaly, arthritis, myositis, and neuritis and changes in liver function tests compared to cases of the first cluster. Patients in the second cluster had a severe brucellosis disease course and were associated with longer delay in seeking medical attention. Moreover, most of them were from Beylagan, a region focused on sheep and goat livestock production in south-central Azerbaijan. Patients in cluster 2 accounted for one-quarter of brucellosis cases and had a more severe clinical presentation. Delay in seeking medical care may explain severe illness. Future work needs to determine the factors that influence brucellosis case seeking and identify brucellosis species, particularly among cases from Beylagan.
A liquid drop model for embedded atom method cluster energies
Finley, C. W.; Abel, P. B.; Ferrante, J.
1996-01-01
Minimum energy configurations for homonuclear clusters containing from two to twenty-two atoms of six metals, Ag, Au, Cu, Ni, Pd, and Pt have been calculated using the Embedded Atom Method (EAM). The average energy per atom as a function of cluster size has been fit to a liquid drop model, giving estimates of the surface and curvature energies. The liquid drop model gives a good representation of the relationship between average energy and cluster size. As a test the resulting surface energies are compared to EAM surface energy calculations for various low-index crystal faces with reasonable agreement.
Fractal dimension of critical clusters in the Φ44 model
Jansen, K.; Lang, C. B.
1991-06-01
We study the d=4 O(4) symmetric nonlinear sigma model at the pseudocritical points for 84-284 lattices. The Fortuin-Kasteleyn-Coniglio-Klein clusters are shown to have fractal dimension df~=3-in accordance with the conjectured scaling relation involving the odd critical exponent δ. For the one cluster algorithm introduced recently by Wolff the dynamical critical exponent z comes out to be compatible with zero in this model.
CLUSTERING ANALYSIS OF DEBRIS-FLOW STREAMS
Institute of Scientific and Technical Information of China (English)
Yuan-Fan TSAI; Huai-Kuang TSAI; Cheng-Yan KAO
2004-01-01
The Chi-Chi earthquake in 1999 caused disastrous landslides, which triggered numerous debris flows and killed hundreds of people. A critical rainfall intensity line for each debris-flow stream is studied to prevent such a disaster. However, setting rainfall lines from incomplete data is difficult, so this study considered eight critical factors to group streams, such that streams within a cluster have similar rainfall lines. A genetic algorithm is applied to group 377 debris-flow streams selected from the center of an area affected by the Chi-Chi earthquake. These streams are grouped into seven clusters with different characteristics. The results reveal that the proposed method effectively groups debris-flow streams.
Clustering dynamic textures with the hierarchical em algorithm for modeling video.
Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B
2013-07-01
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition.
Detection of early glaucomatous progression with octopus cluster trend analysis.
Naghizadeh, Farzaneh; Holló, Gábor
2014-01-01
To compare the ability of Corrected Cluster Trend Analysis (CCTA) and Cluster Trend Analysis (CTA) with event analysis of Octopus visual field series to detect early glaucomatous progression. One eye of 15 healthy, 19 ocular hypertensive, 20 preperimetric, and 51 perimetric glaucoma (PG) patients were investigated with Octopus normal G2 test at 6-month intervals for 1.5 to 3 years. Progression was defined with significant worsening in any of the 10 Octopus clusters with CCTA, and event analysis criteria, respectively. With event analysis, 9 PG eyes showed localized progression and 1 diffuse mean defect (MD) worsening. With CCTA, progression was indicated in 1 normal, 1 ocular hypertensive, and 1 preperimetric glaucoma eyes due to vitreous floaters, and 28 PG eyes including all 9 eyes with localized progression with event analysis. The locations of CCTA progression matched those found with event analysis in all 9 cases. In 17 of the remaining 19 eyes, progressing clusters matched the locations that were suspicious but not definitive for progression with event analysis. In the eye with diffuse MD worsening, CTA found significant progression for 7 clusters. For global MD progression rate, eyes worsened with CCTA only did not differ from the stable eyes but had significantly smaller progression rates than the eyes progressed with event analysis (P=0.0002). In PG, Octopus CCTA and CTA are clinically useful to identify early progression and areas suspicious for early progression. However, in some eyes with no glaucomatous visual field damage, vitreous floaters may cause progression artifacts.
Cluster Analysis of Gene Expression Data
Domany, E
2002-01-01
The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample - such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50 - 100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from a...
Clustering of European winter storms: A multi-model perspective
Renggli, Dominik; Buettner, Annemarie; Scherb, Anke; Straub, Daniel; Zimmerli, Peter
2016-04-01
The storm series over Europe in 1990 (Daria, Vivian, Wiebke, Herta) and 1999 (Anatol, Lothar, Martin) are very well known. Such clusters of severe events strongly affect the seasonally accumulated damage statistics. The (re)insurance industry has quantified clustering by using distribution assumptions deduced from the historical storm activity of the last 30 to 40 years. The use of storm series simulated by climate models has only started recently. Climate model runs can potentially represent 100s to 1000s of years, allowing a more detailed quantification of clustering than the history of the last few decades. However, it is unknown how sensitive the representation of clustering is to systematic biases. Using a multi-model ensemble allows quantifying that uncertainty. This work uses CMIP5 decadal ensemble hindcasts to study clustering of European winter storms from a multi-model perspective. An objective identification algorithm extracts winter storms (September to April) in the gridded 6-hourly wind data. Since the skill of European storm predictions is very limited on the decadal scale, the different hindcast runs are interpreted as independent realizations. As a consequence, the available hindcast ensemble represents several 1000 simulated storm seasons. The seasonal clustering of winter storms is quantified using the dispersion coefficient. The benchmark for the decadal prediction models is the 20th Century Reanalysis. The decadal prediction models are able to reproduce typical features of the clustering characteristics observed in the reanalysis data. Clustering occurs in all analyzed models over the North Atlantic and European region, in particular over Great Britain and Scandinavia as well as over Iberia (i.e. the exit regions of the North Atlantic storm track). Clustering is generally weaker in the models compared to reanalysis, although the differences between different models are substantial. In contrast to existing studies, clustering is driven by weak
Comparative analysis of genomic signal processing for microarray data clustering.
Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe
2011-12-01
Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.
Using cluster analysis to organize and explore regional GPS velocities
Simpson, Robert W.; Thatcher, Wayne; Savage, James C.
2012-01-01
Cluster analysis offers a simple visual exploratory tool for the initial investigation of regional Global Positioning System (GPS) velocity observations, which are providing increasingly precise mappings of actively deforming continental lithosphere. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, can be subjective and is often guided by the distribution of known faults. To illustrate our method, we apply cluster analysis to GPS velocities from the San Francisco Bay Region, California, to search for spatially coherent patterns of deformation, including evidence of block-like behavior. The clustering process identifies four robust groupings of velocities that we identify with four crustal blocks. Although the analysis uses no prior geologic information other than the GPS velocities, the cluster/block boundaries track three major faults, both locked and creeping.
Effective Transparency: A Test of Atomistic Laser-Cluster Models
Pandit, Rishi; Teague, Thomas; Hartwick, Zachary; Bigaouette, Nicolas; Ramunno, Lora; Ackad, Edward
2016-01-01
The effective transparency of rare-gas clusters, post-interaction with an extreme ultraviolet (XUV) pump pulse, is studied by using an atomistic hybrid quantum-classical molecular dynamics model. We find there is an intensity range in which an XUV probe pulse has no lasting effect on the average charge state of a cluster after being saturated by an XUV pump pulse: the cluster is effectively transparent to the probe pulse. The range of this phenomena increases with the size of the cluster and thus provides an excellent candidate for an experimental test of the effective transparency effect. We present predictions for the clusters at the peak of the laser pulse as well as the experimental time-of-flight signal expected along with trends which can be compared with. Significant deviations from these predictions would provide evidence for enhanced photoionization mechanism(s).
CLUSTERS AS A MODEL OF ECONOMIC DEVELOPMENT OF SERBIA
Directory of Open Access Journals (Sweden)
Marko Laketa
2013-12-01
Full Text Available Insufficient competitiveness of small and medium enterprises in Serbia can be significantly improved by a system of business associations through clusters, business incubators and technology parks. This connection contributes to the growth and development of not only the cluster members, but has a regional and national dimension as well because without it there is no significant breakthrough on the international market. The process of association of small and medium enterprises in clusters and other forms of interconnection in Serbia is far from the required and potential level.The awareness on the importance of clusters in a local economic development through contributions to the advancement of small and medium sized enterprises is not yet sufficiently mature. Support to associating into clusters and usage of their benefits after the model of highly developed countries is the basis for leading a successful economic policy and in Serbia there are all necessary prerequisites for it.
Emergence of Clustering in an Acquaintance Model without Homophily
Bhat, Uttam; Redner, S
2014-01-01
We introduce an agent-based acquaintance model in which social links are created by processes in which there is no explicit homophily. In spite of this constraint, highly-clustered social networks can arise. The crucial feature of our model is that of variable transitive interactions. That is, when an agent introduces two unconnected friends, the rate at which a connection actually occurs between them is controllable. As this transitive interaction rate is varied, the social network undergoes a dramatic clustering transition and the network consists of a collection of well-defined communities close to the transition. As a function of time, the network can undergo an incomplete gelation transition, in which the gel, or giant cluster, does not constitute the entire network, even at infinite time. Some of the clustering properties of our model also arise, albeit less dramatically, in Facebook networks.
Multi-mode clustering model for hierarchical wireless sensor networks
Hu, Xiangdong; Li, Yongfu; Xu, Huifen
2017-03-01
The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.
Fodeh, Samah J; Lazenby, Mark; Bai, Mei; Ercolano, Elizabeth; Murphy, Terrence; McCorkle, Ruth
2013-10-01
Symptoms and subsequent functional impairment have been associated with the biological processes of disease, including the interaction between disease and treatment in a measurement model of symptoms. However, hitherto cluster analysis has primarily focused on symptoms. This study among patients within 100 days of diagnosis with advanced cancer explored whether self-reported physical symptoms and functional impairments formed clusters at the time of diagnosis. We applied cluster analysis to self-reported symptoms and activities of daily living of 111 patients newly diagnosed with advanced gastrointestinal (GI), gynecological, head and neck, and lung cancers. Based on content expert evaluations, the best techniques and variables were identified, yielding the best solution. The best cluster solution used a K-means algorithm and cosine similarity and yielded five clusters of physical as well as emotional symptoms and functional impairments. Cancer site formed the predominant organizing principle of composition for each cluster. The top five symptoms and functional impairments in each cluster were Cluster 1 (GI): outlook, insomnia, appearance, concentration, and eating/feeding; Cluster 2 (GI): appetite, bowel, insomnia, eating/feeding, and appearance; Cluster 3 (gynecological): nausea, insomnia, eating/feeding, concentration, and pain; Cluster 4 (head and neck): dressing, eating/feeding, bathing, toileting, and walking; and Cluster 5 (lung): cough, walking, eating/feeding, breathing, and insomnia. Functional impairments in patients newly diagnosed with late-stage cancers behave as symptoms during the diagnostic phase. Health care providers need to expand their assessments to include both symptoms and functional impairments. Early recognition of functional changes may accelerate diagnosis at an earlier cancer stage. Copyright © 2013 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.
1985-01-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
Energy Technology Data Exchange (ETDEWEB)
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.
1985-08-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references.
A Distributed Flocking Approach for Information Stream Clustering Analysis
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL
2006-01-01
Intelligence analysts are currently overwhelmed with the amount of information streams generated everyday. There is a lack of comprehensive tool that can real-time analyze the information streams. Document clustering analysis plays an important role in improving the accuracy of information retrieval. However, most clustering technologies can only be applied for analyzing the static document collection because they normally require a large amount of computation resource and long time to get accurate result. It is very difficult to cluster a dynamic changed text information streams on an individual computer. Our early research has resulted in a dynamic reactive flock clustering algorithm which can continually refine the clustering result and quickly react to the change of document contents. This character makes the algorithm suitable for cluster analyzing dynamic changed document information, such as text information stream. Because of the decentralized character of this algorithm, a distributed approach is a very natural way to increase the clustering speed of the algorithm. In this paper, we present a distributed multi-agent flocking approach for the text information stream clustering and discuss the decentralized architectures and communication schemes for load balance and status information synchronization in this approach.
Clinical implications of chronic heart failure phenotypes defined by cluster analysis.
Ahmad, Tariq; Pencina, Michael J; Schulte, Phillip J; O'Brien, Emily; Whellan, David J; Piña, Ileana L; Kitzman, Dalane W; Lee, Kerry L; O'Connor, Christopher M; Felker, G Michael
2014-10-28
Classification of chronic heart failure (HF) is on the basis of criteria that may not adequately capture disease heterogeneity. Improved phenotyping may help inform research and therapeutic strategies. This study used cluster analysis to explore clinical phenotypes in chronic HF patients. A cluster analysis was performed on 45 baseline clinical variables from 1,619 participants in the HF-ACTION (Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training) study, which evaluated exercise training versus usual care in chronic systolic HF. An association between identified clusters and clinical outcomes was assessed using Cox proportional hazards modeling. Differential associations between clinical outcomes and exercise testing were examined using interaction testing. Four clusters were identified (ranging from 248 to 773 patients in each), in which patients varied considerably among measures of age, sex, race, symptoms, comorbidities, HF etiology, socioeconomic status, quality of life, cardiopulmonary exercise testing parameters, and biomarker levels. Differential associations were observed for hospitalization and mortality risks between and within clusters. Compared with cluster 1, risk of all-cause mortality and/or all-cause hospitalization ranged from 0.65 (95% confidence interval [95% CI]: 0.54 to 0.78) for cluster 4 to 1.02 (95% CI: 0.87 to 1.19) for cluster 3. However, for all-cause mortality, cluster 3 had a disproportionately lower risk of 0.61 (95% CI: 0.44 to 0.86). Evidence suggested differential effects of exercise treatment on changes in peak oxygen consumption and clinical outcomes between clusters (p for interaction Cluster analysis of clinical variables identified 4 distinct phenotypes of chronic HF. Our findings underscore the high degree of disease heterogeneity that exists within chronic HF patients and the need for improved phenotyping of the syndrome. (Exercise Training Program to Improve Clinical Outcomes in Individuals With
Sequential Combination Methods forData Clustering Analysis
Institute of Scientific and Technical Information of China (English)
钱 涛; Ching Y.Suen; 唐远炎
2002-01-01
This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.
Cluster analysis of WIBS single particle bioaerosol data
Directory of Open Access Journals (Sweden)
N. H. Robinson
2012-09-01
Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.
Cluster analysis of WIBS single particle bioaerosol data
Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.
2012-09-01
Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial datasets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Waveband Integrated Bioaerosol Sensor (WIBS). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS datasets recorded in a forest site in Colorado, USA as part of the BEACHON-RoMBAS project. Cluster analysis results between both datasets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long term online PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics is improved.
Cluster analysis of clinical data identifies fibromyalgia subgroups.
Directory of Open Access Journals (Sweden)
Elisa Docampo
Full Text Available INTRODUCTION: Fibromyalgia (FM is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. MATERIAL AND METHODS: 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. RESULTS: VARIABLES CLUSTERED INTO THREE INDEPENDENT DIMENSIONS: "symptomatology", "comorbidities" and "clinical scales". Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1, high symptomatology and comorbidities (Cluster 2, and high symptomatology but low comorbidities (Cluster 3, showing differences in measures of disease severity. CONCLUSIONS: We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment.
Directory of Open Access Journals (Sweden)
Babankumar S. Bansod
2011-02-01
Full Text Available Starting from descriptive data on crop yield and various other properties, the aim of this study is to reveal the trends on soil behaviour, such as crop yield. This study has been carried out by developing web application that uses a well known technique- Cluster Analysis. The cluster analysis revealed linkages between soil classes for the same field as well as between different fields, which can be partly assigned to crops rotation and determination of variable soil input rates. A hybrid clustering algorithm has been developed taking into account the traits of two clustering technologies: i Hierarchical clustering, ii K-means clustering. This hybrid clustering algorithm is applied to sensor- gathered data about soil and analysed, resulting in the formation of well delineatedmanagement zones based on various properties of soil, such as, ECa , crop yield, etc. One of the purposes of the study was to identify the main factors affecting the crop yield and the results obtained were validated with existing techniques. To accomplish this purpose, geo-referenced soil information has been examined. Also, based on this data, statistical method has been used to classify and characterize the soil behaviour. This is done using a prediction model, developed to predict the unknown behaviour of clusters based on the known behaviour of other clusters. In predictive modeling, data has been collected for the relevant predictors, a statistical model has been formulated, predictions were made and the model can be validated (or revised as additional data becomes available. The model used in the web application has been formed taking into account neural network based minimum hamming distance criterion.
The Quintuplet Cluster II. Analysis of the WN stars
Liermann, A; Oskinova, L M; Todt, H; Butler, K; 10.1051/0004-6361/200912612
2010-01-01
Based on $K$-band integral-field spectroscopy, we analyze four Wolf-Rayet stars of the nitrogen sequence (WN) found in the inner part of the Quintuplet cluster. All WN stars (WR102d, WR102i, WR102hb, and WR102ea) are of spectral subtype WN9h. One further star, LHO110, is included in the analysis which has been classified as Of/WN? previously but turns out to be most likely a WN9h star as well. The Potsdam Wolf-Rayet (PoWR) models for expanding atmospheres are used to derive the fundamental stellar and wind parameters. The stars turn out to be very luminous, $\\log{(L/L_\\odot)} > 6.0$, with relatively low stellar temperatures, $T_* \\approx$ 25--35\\,kK. Their stellar winds contain a significant fraction of hydrogen, up to $X_\\mathrm{H} \\sim 0.45$ (by mass). We discuss the position of the Galactic center WN stars in the Hertzsprung-Russell diagram and find that they form a distinct group. In this respect, the Quintuplet WN stars are similar to late-type WN stars found in the Arches cluster and elsewhere in the Ga...
The cluster beam route to model catalysts and beyond.
Ellis, Peter R; Brown, Christopher M; Bishop, Peter T; Yin, Jinlong; Cooke, Kevin; Terry, William D; Liu, Jian; Yin, Feng; Palmer, Richard E
2016-07-01
The generation of beams of atomic clusters in the gas phase and their subsequent deposition (in vacuum) onto suitable catalyst supports, possibly after an intermediate mass filtering step, represents a new and attractive approach for the preparation of model catalyst particles. Compared with the colloidal route to the production of pre-formed catalytic nanoparticles, the nanocluster beam approach offers several advantages: the clusters produced in the beam have no ligands, their size can be selected to arbitrarily high precision by the mass filter, and metal particles containing challenging combinations of metals can be readily produced. However, until now the cluster approach has been held back by the extremely low rates of metal particle production, of the order of 1 microgram per hour. This is more than sufficient for surface science studies but several orders of magnitude below what is desirable even for research-level reaction studies under realistic conditions. In this paper we describe solutions to this scaling problem, specifically, the development of two new generations of cluster beam sources, which suggest that cluster beam yields of grams per hour may ultimately be feasible. Moreover, we illustrate the effectiveness of model catalysts prepared by cluster beam deposition onto agitated powders in the selective hydrogenation of 1-pentyne (a gas phase reaction) and 3-hexyn-1-ol (a liquid phase reaction). Our results for elemental Pd and binary PdSn and PdTi cluster catalysts demonstrate favourable combinations of yield and selectivity compared with reference materials synthesised by conventional methods.
Spatial Data Mining using Cluster Analysis
Directory of Open Access Journals (Sweden)
Ch.N.Santhosh Kumar
2012-09-01
Full Text Available Data mining, which is refers to as Knowledge Discovery in Databases(KDD, means a process of nontrivialexaction of implicit, previously useful and unknown information such as knowledge rules, descriptions,regularities, and major trends from large databases. Data mining is evolved in a multidisciplinary field ,including database technology, machine learning, artificial intelligence, neural network, informationretrieval, and so on. In principle data mining should be applicable to the different kind of data and databasesused in many different applications, including relational databases, transactional databases, datawarehouses, object- oriented databases, and special application- oriented databases such as spatialdatabases, temporal databases, multimedia databases, and time- series databases. Spatial data mining, alsocalled spatial mining, is data mining as applied to the spatial data or spatial databases. Spatial data are thedata that have spatial or location component, and they show the information, which is more complex thanclassical data. A spatial database stores spatial data represents by spatial data types and spatialrelationships and among data. Spatial data mining encompasses various tasks. These include spatialclassification, spatial association rule mining, spatial clustering, characteristic rules, discriminant rules,trend detection. This paper presents how spatial data mining is achieved using clustering.
Cluster Development of Zhengzhou Urban Agriculture Based on Diamond Model
Institute of Scientific and Technical Information of China (English)
2012-01-01
Based on basic theory of Diamond Model,this paper analyzes the competitive power of Zhengzhou urban agriculture from production factors,demand conditions,related and supporting industries,business strategies and structure,and horizontal competition.In line with these situations,it introduces that the cluster development is an effective approach to lifting competitive power of Zhengzhou urban agriculture.Finally,it presents following countermeasures and suggestions:optimize spatial distribution for cluster development of urban agriculture;cultivate leading enterprises and optimize organizational form of urban agriculture;energetically develop low-carbon agriculture to create favorable ecological environment for cluster development of urban agriculture.
Gas phase metal cluster model systems for heterogeneous catalysis.
Lang, Sandra M; Bernhardt, Thorsten M
2012-07-14
Since the advent of intense cluster sources, physical and chemical properties of isolated metal clusters are an active field of research. In particular, gas phase metal clusters represent ideal model systems to gain molecular level insight into the energetics and kinetics of metal-mediated catalytic reactions. Here we summarize experimental reactivity studies as well as investigations of thermal catalytic reaction cycles on small gas phase metal clusters, mostly in relation to the surprising catalytic activity of nanoscale gold particles. A particular emphasis is put on the importance of conceptual insights gained through the study of gas phase model systems. Based on these concepts future perspectives are formulated in terms of variation and optimization of catalytic materials e.g. by utilization of bimetals and metal oxides. Furthermore, the future potential of bio-inspired catalytic material systems are highlighted and technical developments are discussed.
Institute of Scientific and Technical Information of China (English)
李军辉
2014-01-01
Different from the existing research which describes the external feature of industrial cluster as its revolution mechanism,this paper utilizes the infra-marginal analysis method and constructs a universally endogenous professionalized general dynamic equilibrium model,based on the theoretical essence that industrial evolution stems from the increasing of industrial tortuosity.By setting diverse parameters,this paper stimulates the output changes of the intermediate and final products.Through mathematical model analysis into three stages,it finds that two predominant factors which exert influence on the form of industrial cluster evolution are "the specialized economic degree of intermediate products" and "the degree of complementary economy".When the former parameter surpasses the latter,industrial cluster evolution shows a stable and hierarchical process.On the contrary,when the latter parameter overrides the former,industrial cluster evolution represents a reversible and instable step-evolution process.The conclusion can be used as a theoretical reference for industrial development model and path of China's different areas.%不同于将产业集群静态外部特征描述作为其演进机理的现有成果,文章围绕集群演进源自产业迂回度增加这一理论实质,运用超边际分析方法,构建了普适意义上的内生专业化一般动态均衡模型,通过不同的参数设置,模拟中间及最终产品的产出变化.经过三个阶段的数理模型分析,发现“中间产品生产的专业化经济程度”和“互补性经济程度”是影响产业集群演化过程型态的关键因素.当前者强于后者时,集群发展将呈现稳定的递阶演化;相反,则会出现可逆、不稳的阶跃式演化.该结论将为我国不同区域的产业发展模式与道路选择提供理论参考.
Filippetti, Vanessa Aran; Allegri, Ricardo F
2011-04-01
Verbal fluency (VF) tasks are extensively used to measure strategic retrieval and executive functioning. Results for total production of words, clustering and switching strategies, and performance over time for Spanish-speaking children are provided. A total of 120 children, ranging in age from 8 to 11, were divided by age into two groups and evaluated. A higher total score for words produced in the semantic compared with the phonological task, a correlation between clustering and switching strategies and total score, and decreased task performance over time were evidenced. These scores were higher in the older group. Moreover, an association was found between verbal fluency tasks, strategies employed, and cognitive executive functions. This indicates that clustering and switching strategies provide indicators of strategic retrieval and executive processes. Together the results suggest that these fluency scores are valuable to measure underlying cognitive processes and retrieval strategies and therefore could be useful to assess executive function deficits in children.
Modeling the Color Magnitude Relation for Galaxy Clusters
Jimenez, Noelia; Castelli, Analia Smith; Bassino, Lilia P
2011-01-01
We investigate the origin of the colour-magnitude relation (CMR) observed in cluster galaxies by using a combination of a cosmological N-body simulation of a cluster of galaxies and a semi-analytic model of galaxy formation. The departure of galaxies in the bright end of the CMR with respect to the trend denoted by less luminous galaxies could be explained by the influence of minor mergers
Multivariate analysis of the globular clusters in M87
Das, Sukanta; Davoust, Emmanuel
2015-01-01
An objective classification of 147 globular clusters in the inner region of the giant elliptical galaxy M87 is carried out with the help of two methods of multivariate analysis. First independent component analysis is used to determine a set of independent variables that are linear combinations of various observed parameters (mostly Lick indices) of the globular clusters. Next K-means cluster analysis is applied on the independent components, to find the optimum number of homogeneous groups having an underlying structure. The properties of the four groups of globular clusters thus uncovered are used to explain the formation mechanism of the host galaxy. It is suggested that M87 formed in two successive phases. First a monolithic collapse, which gave rise to an inner group of metal-rich clusters with little systematic rotation and an outer group of metal-poor clusters in eccentric orbits. In a second phase, the galaxy accreted low-mass satellites in a dissipationless fashion, from the gas of which the two othe...
Barker, Daniel; D'Este, Catherine; Campbell, Michael J; McElduff, Patrick
2017-03-09
Stepped wedge cluster randomised trials frequently involve a relatively small number of clusters. The most common frameworks used to analyse data from these types of trials are generalised estimating equations and generalised linear mixed models. A topic of much research into these methods has been their application to cluster randomised trial data and, in particular, the number of clusters required to make reasonable inferences about the intervention effect. However, for stepped wedge trials, which have been claimed by many researchers to have a statistical power advantage over the parallel cluster randomised trial, the minimum number of clusters required has not been investigated. We conducted a simulation study where we considered the most commonly used methods suggested in the literature to analyse cross-sectional stepped wedge cluster randomised trial data. We compared the per cent bias, the type I error rate and power of these methods in a stepped wedge trial setting with a binary outcome, where there are few clusters available and when the appropriate adjustment for a time trend is made, which by design may be confounding the intervention effect. We found that the generalised linear mixed modelling approach is the most consistent when few clusters are available. We also found that none of the common analysis methods for stepped wedge trials were both unbiased and maintained a 5% type I error rate when there were only three clusters. Of the commonly used analysis approaches, we recommend the generalised linear mixed model for small stepped wedge trials with binary outcomes. We also suggest that in a stepped wedge design with three steps, at least two clusters be randomised at each step, to ensure that the intervention effect estimator maintains the nominal 5% significance level and is also reasonably unbiased.
Identifying clinical course patterns in SMS data using cluster analysis
DEFF Research Database (Denmark)
Kent, Peter; Kongsted, Alice
2012-01-01
ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically importa...... of cluster analysis. More research is needed, especially head-to-head studies, to identify which technique is best to use under what circumstances.......ABSTRACT: BACKGROUND: Recently, there has been interest in using the short message service (SMS or text messaging), to gather frequent information on the clinical course of individual patients. One possible role for identifying clinical course patterns is to assist in exploring clinically important...... by spline analysis. However, cluster analysis of SMS data in its original untransformed form may be simpler and offer other advantages. Therefore, the aim of this study was to determine whether cluster analysis could be used for identifying clinical course patterns distinct from the pattern of the whole...
Cluster analysis of undergraduate drinkers based on alcohol expectancy scores.
Leeman, Robert F; Kulesza, Magdalena; Stewart, Diana W; Copeland, Amy L
2012-03-01
Expectancies of alcohol's effects have been associated with problem drinking in undergraduates. If subgroups can be classified based on expectancies, this may facilitate identifying those at highest risk for problem drinking. Undergraduates (N = 612) from two state universities completed a web-based survey. Responses to the Comprehensive Effects of Alcohol scale were analyzed using k-means cluster analysis separately within each university sample. Hartigan's heuristic was used to determine that five was the optimal number of clusters in each sample. Clusters were distinguishable based on their overall magnitude of expectancy endorsement and by a tendency to endorse stronger positive than negative expectancies. Subsequent analyses were conducted to compare clusters on alcohol involvement and trait disinhibition. A cluster characterized by endorsement of positive and negative expectancies ("strong expectancy") was associated with a particularly problematic risk profile, specifically concerning difficulties with self-control (i.e., trait disinhibition and impaired control over alcohol use). A cluster with higher positive and lower negative expectancies reported frequent heavy drinking but appeared to be at lower risk than the strong expectancy cluster in a number of respects. Negative expectancy endorsement appeared to represent added risk above and beyond positive expectancies. Results suggest that both the magnitude and combination of expectancies endorsed by subgroups of undergraduate drinkers may relate to their risk level in terms of alcohol involvement and personality traits. These findings may have implications for interventions with young adult drinkers.
DEFF Research Database (Denmark)
Pedersen, Anders Bro; Aabrandt, Andreas; Østergaard, Jacob
2014-01-01
scales, which calls for a statistically correct, yet flexible model. This paper describes a method for modelling EV, based on non-categorized data, which takes into account the plug in locations of the vehicles. By using clustering analysis to extrapolate and classify the primary locations where...
Study of nuclear clustering using the modern shell model approach
Volya, Alexander; Tchuvil'Sky, Yury
2014-03-01
Nuclear clustering, alpha decays, and multi-particle correlations are important components of nuclear dynamics. In this work we use the modern configuration-interaction approach with most advanced realistic shell-model Hamiltonians to study these questions. We utilize the algebraic many-nucleon structures and the corresponding fractional parentage coefficients to build the translationally invariant wave functions of the alpha-cluster channels. We explore the alpha spectroscopic factors, study the distribution of clustering strength, and discuss the structure of an effective 4-body operator describing the in-medium alpha dynamics in the multi-shell valence configuration space. Sensitivity of alpha clustering to the components of an effective Hamiltonian, which includes its collective and many-body components, as well as isospin symmetry breaking terms, are of interest. We offer effective techniques for evaluation of the cluster spectroscopic factors satisfying the orthogonality conditions of the respective cluster channels. We present a study of clustering phenomena, single-particle dynamics, and electromagnetic transitions for a number of nuclei in p-sd shells and compare our results with the experimentally available data. This work is supported by the U.S. Department of Energy under contract number DE-SC0009883.
Molecular dynamics modelling of EGCG clusters on ceramide bilayers
Energy Technology Data Exchange (ETDEWEB)
Yeo, Jingjie; Cheng, Yuan; Li, Weifeng; Zhang, Yong-Wei [Institute of High Performance Computing, A*STAR, 138632 (Singapore)
2015-12-31
A novel method of atomistic modelling and characterization of both pure ceramide and mixed lipid bilayers is being developed, using only the General Amber ForceField. Lipid bilayers modelled as pure ceramides adopt hexagonal packing after equilibration, and the area per lipid and bilayer thickness are consistent with previously reported theoretical results. Mixed lipid bilayers are modelled as a combination of ceramides, cholesterol, and free fatty acids. This model is shown to be stable after equilibration. Green tea extract, also known as epigallocatechin-3-gallate, is introduced as a spherical cluster on the surface of the mixed lipid bilayer. It is demonstrated that the cluster is able to bind to the bilayers as a cluster without diffusing into the surrounding water.
Parameterization of geophysical inversion model using particle clustering
Yang, Dikun
2015-01-01
This paper presents a new method of constructing physical models in a geophysical inverse problem, when there are only a few possible physical property values in the model and they are reasonably known but the geometry of the target is sought. The model consists of a fixed background and many small "particles" as building blocks that float around in the background to resemble the target by clustering. This approach contrasts the conventional geometric inversions requiring the target to be regularly shaped bodies, since here the geometry of the target can be arbitrary and does not need to be known beforehand. Because of the lack of resolution in the data, the particles may not necessarily cluster when recovering compact targets. A model norm, called distribution norm, is introduced to quantify the spread of particles and incorporated into the objective function to encourage further clustering of the particles. As proof of concept, 1D magnetotelluric inversion is used as example. My experiments reveal that the ...
Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry
2005-10-06
The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.
Cluster analysis of radionuclide concentrations in beach sand
de Meijer, R.J.; James, I.; Jennings, P.J.; Keoyers, J.E.
This paper presents a method in which natural radionuclide concentrations of beach sand minerals are traced along a stretch of coast by cluster analysis. This analysis yields two groups of mineral deposit with different origins. The method deviates from standard methods of following dispersal of
A dynamical $\\alpha$-cluster model of $^{16}$O
Halcrow, C J; Manton, N S
2016-01-01
We calculate the low-lying spectrum of the $^{16}$O nucleus using an $\\alpha$-cluster model which includes the important tetrahedral and square configurations. Our approach is motivated by the dynamics of $\\alpha$-particle scattering in the Skyrme model. We are able to replicate the large energy splitting that is observed between states of identical spin but opposite parities, as well as introduce states that were previously not found in other cluster models, such as a $0^-$ state. We also provide a novel interpretation of the first excited state of $^{16}$O and make predictions for the energies of $6^-$ states that have yet to be observed experimentally.
Cluster analysis in retail segmentation for credit scoring
Directory of Open Access Journals (Sweden)
Sanja Scitovski
2014-12-01
Full Text Available The aim of this paper is to segment retail clients by using adaptive Mahalanobis clustering in a way that each segment can be suitable for separate credit scoring development such that a better risk assessment of retail clients could be accomplished. A real data set on retail clients from a Croatian bank was used in the paper. Grouping of the data point set is carried out by using the adaptive Mahalanobis partitioning algorithm (see, e.g., [20]. It is an incremental algorithm, which recognizes ellipsoidal clusters with the main axes in the directions of eigenvectors of the corresponding covariance matrix of the data set. On the basis of the given data set, by using the well-known DIRECT algorithm for global optimization it is possible to search successively for an optimal partition with k=2, 3,... clusters. After that, a partition with the most appropriate number of clusters is determined by using various validity indexes. Based on the description of each cluster, banks could decide to develop a separate credit scoring model for each cluster as well as to create a business strategy customized to each cluster.
Sollima, A.; Dalessandro, E.; Beccari, G.; Pallanca, C.
2017-02-01
We present the results of the analysis of deep photometric data for a sample of three Galactic globular clusters (NGC5466, NGC6218 and NGC 6981) with the aim of estimating their degree of mass segregation and testing the predictions of analytic dynamical models. The adopted data set, composed of both Hubble Space Telescope and ground-based data, reaches the low-mass end of the mass functions of these clusters from the centre up to their tidal radii allowing us to derive the radial distribution of stars with different masses. All the analysed clusters show evidence of mass segregation with the most massive stars being more concentrated than the low-mass ones. The structures of NGC5466 and NGC6981 are well reproduced by multimass dynamical models adopting a lowered Maxwellian distribution function and the prescription for mass segregation given by Gunn & Griffin. Instead, NGC6218 appears to be more mass segregated than model predictions. By applying the same technique to mock observations derived from snapshots selected from suitable N-body simulations, we show that the deviation from the behaviour predicted by these models depends on the particular stage of dynamical evolution regardless of initial conditions.
Technology Clusters Exploration for Patent Portfolio through Patent Abstract Analysis
Directory of Open Access Journals (Sweden)
Gabjo Kim
2016-12-01
Full Text Available This study explores technology clusters through patent analysis. The aim of exploring technology clusters is to grasp competitors’ levels of sustainable research and development (R&D and establish a sustainable strategy for entering an industry. To achieve this, we first grouped the patent documents with similar technologies by applying affinity propagation (AP clustering, which is effective while grouping large amounts of data. Next, in order to define the technology clusters, we adopted the term frequency-inverse document frequency (TF-IDF weight, which lists the terms in order of importance. We collected the patent data of Korean electric car companies from the United States Patent and Trademark Office (USPTO to verify our proposed methodology. As a result, our proposed methodology presents more detailed information on the Korean electric car industry than previous studies.
An Empirical Analysis of Rough Set Categorical Clustering Techniques
2017-01-01
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy. PMID:28068344
An Empirical Analysis of Rough Set Categorical Clustering Techniques.
Uddin, Jamal; Ghazali, Rozaida; Deris, Mustafa Mat
2017-01-01
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For categorical data clustering the rough set based approaches such as Maximum Dependency Attribute (MDA) and Maximum Significance Attribute (MSA) has outperformed their predecessor approaches like Bi-Clustering (BC), Total Roughness (TR) and Min-Min Roughness(MMR). This paper presents the limitations and issues of MDA and MSA techniques on special type of data sets where both techniques fails to select or faces difficulty in selecting their best clustering attribute. Therefore, this analysis motivates the need to come up with better and more generalize rough set theory approach that can cope the issues with MDA and MSA. Hence, an alternative technique named Maximum Indiscernible Attribute (MIA) for clustering categorical data using rough set indiscernible relations is proposed. The novelty of the proposed approach is that, unlike other rough set theory techniques, it uses the domain knowledge of the data set. It is based on the concept of indiscernibility relation combined with a number of clusters. To show the significance of proposed approach, the effect of number of clusters on rough accuracy, purity and entropy are described in the form of propositions. Moreover, ten different data sets from previously utilized research cases and UCI repository are used for experiments. The results produced in tabular and graphical forms shows that the proposed MIA technique provides better performance in selecting the clustering attribute in terms of purity, entropy, iterations, time, accuracy and rough accuracy.
Cluster variation studies of the anisotropic exchange interaction model
King, T. C.; Chen, H. H.
The cluster variation method is applied to study critical properties of the Potts-like ferromagnetic anisotropic exchange interaction model. Phase transition temperatures, order parameter discontinuities and latent heats of the model on the triangular and the fcc lattices are determined by the triangle approximation; and those on the square and the sc lattices are determined by the square approximation.
Constraining Galaxy Formation Models with Dwarf Ellipticals in Clusters
Conselice, C J
2005-01-01
Recent observations demonstrate that dwarf elliptical (dE) galaxies in clusters, despite their faintness, are likely a critical galaxy type for understanding the processes behind galaxy formation. Dwarf ellipticals are the most common galaxy type, and are particularly abundant in rich galaxy clusters. The dwarf to giant ratio is in fact highest in rich clusters of galaxies, suggesting that cluster dEs do not form in groups that later merge to form clusters. Dwarf ellipticals are potentially the only galaxy type whose formation is sensitive to global, rather than local, environment. The dominant idea for explaining the formation of these systems, through Cold Dark Matter models, is that dEs form early and within their present environments. Recent results suggest that some dwarfs appear in clusters after the bulk of massive galaxies form, a scenario not predicted in standard hierarchical structure formation models. Many dEs have younger and more metal rich stellar populations than dwarfs in lower density enviro...
Evolutionary Synthesis Modelling of Young Star Clusters in Merging Galaxies
Anders, P; De Grijs, R; Anders, Peter; Alvensleben, Uta Fritze - v.; Grijs, Richard de
2003-01-01
The observational properties of globular cluster systems (GCSs) are vital tools to investigate the violent star formation histories of their host galaxies. This violence is thought to have been triggered by galaxy interactions or mergers. The most basic properties of a GCS are its luminosity function (number of clusters per luminosity bin) and color distributions. A large number of observed GCS show bimodal color distributions, which can be translated into a bimodality in either metallicity and/or age. An additional uncertainty comes into play when one considers extinction. These effects can be disentangled either by obtaining spectroscopic data for the clusters or by imaging observations in at least four passbands. This allows us then to discriminate between various formation scenarios of GCSs, e.g. the merger scenario by Ashman & Zepf, and the multi-phase collapse model by Forbes et. al.. Young and metal-rich star cluster populations are seen to form in interacting and merging galaxies. We analyse multi...
A Collaboration Service Model for a Global Port Cluster
Directory of Open Access Journals (Sweden)
Keith K.T. Toh
2010-03-01
Full Text Available The importance of port clusters to a global city may be viewed from a number of perspectives. The development of port clusters and economies of agglomeration and their contribution to a regional economy is underpinned by information and physical infrastructure that facilitates collaboration between business entities within the cluster. The maturity of technologies providing portals, web and middleware services provides an opportunity to push the boundaries of contemporary service reference models and service catalogues to what the authors propose to be "collaboration services". Servicing port clusters, portal engineers of the future must consider collaboration services to benefit a region. Particularly, service orchestration through a "public user portal" must gain better utilisation of publically owned infrastructure, to share knowledge and collaborate among organisations through information systems.
Visualization methods for statistical analysis of microarray clusters
Directory of Open Access Journals (Sweden)
Li Kai
2005-05-01
Full Text Available Abstract Background The most common method of identifying groups of functionally related genes in microarray data is to apply a clustering algorithm. However, it is impossible to determine which clustering algorithm is most appropriate to apply, and it is difficult to verify the results of any algorithm due to the lack of a gold-standard. Appropriate data visualization tools can aid this analysis process, but existing visualization methods do not specifically address this issue. Results We present several visualization techniques that incorporate meaningful statistics that are noise-robust for the purpose of analyzing the results of clustering algorithms on microarray data. This includes a rank-based visualization method that is more robust to noise, a difference display method to aid assessments of cluster quality and detection of outliers, and a projection of high dimensional data into a three dimensional space in order to examine relationships between clusters. Our methods are interactive and are dynamically linked together for comprehensive analysis. Further, our approach applies to both protein and gene expression microarrays, and our architecture is scalable for use on both desktop/laptop screens and large-scale display devices. This methodology is implemented in GeneVAnD (Genomic Visual ANalysis of Datasets and is available at http://function.princeton.edu/GeneVAnD. Conclusion Incorporating relevant statistical information into data visualizations is key for analysis of large biological datasets, particularly because of high levels of noise and the lack of a gold-standard for comparisons. We developed several new visualization techniques and demonstrated their effectiveness for evaluating cluster quality and relationships between clusters.
Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means
Directory of Open Access Journals (Sweden)
Imianvan Anthony Agboizebeta
2012-01-01
Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain and spinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along the nerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammation causes the myelin to disappear. Genetic factors, environmental issues and viral infection may also play a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss of balance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMean analysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper. Application of cluster analysis involves a sequence of methodological and analytical decision steps that enhances the quality and meaning of the clusters produced. Uncertainties associated with analysis of multiple sclerosis test data are eliminated by the system
New Clustering Method in High-Dimensional Space Based on Hypergraph-Models
Institute of Scientific and Technical Information of China (English)
CHEN Jian-bin; WANG Shu-jing; SONG Han-tao
2006-01-01
To overcome the limitation of the traditional clustering algorithms which fail to produce meanirigful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attribute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value.Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.
Krasilenko, Vladimir G.; Lazarev, Alexander A.; Nikitovich, Diana V.
2014-08-01
In the paper, we show that the nonlinear spatial non-linear equivalency functions on the basis of continuous logic equivalence (nonequivalence) operations have better discriminatory properties for comparing images. Further, using the equivalent model of multiport neural networks and associative memory, (including matrix-matrix and matrix-tensor with adaptive-weighted correlation, multi-port neural-net auto-associative and hetero-associative memory (MP NN AAM and HAM ) and the proposed architecture based on them, we show how we can modify these models and architectures for space-invariant associative recognition and clustering (high performance parallel clustering processing) images. We consider possible implementations of 2D image classifiers, devices for partitioning image fragments into clusters and their architectures. The main base unit of such architectures is a matrix-matrix or matrix-tensor equivalentor, which can be implemented on the basis of two traditional correlators. We show that the classifiers based on the equivalency paradigm and optoelectronic architectures with space-time integration and parallel-serial 2D images processing have advantages such as increased memory capacity (more than ten times of the number of neurons!), High performance in different modes . We present the results of associative significant dimension (128x128, 610x340) image recognition - renewal modeling. It will be shown that these models are capable to recognize images with a significant percentage (20- 30%) damaged pixels. The experimental results show that such models can be successfully used for auto-and heteroassociative pattern recognition. We show simulation results of using these modifications for clustering and learning models and algorithms for cluster analysis of specific images and divide them into categories of the array. Show example of a cluster division of image fragments, letters and graphics for clusters with simultaneous formation of the outputweighted spatial
Cluster Analysis of Metal Concentrations in River Kubanni Zaria, Nigeria
Directory of Open Access Journals (Sweden)
A.W. Butu
2013-08-01
Full Text Available The cluster analysis was used to assess the degree of association of the metal concentrations in river Kubanni Zaria, Nigeria. The main sources of data for the analysis were the sediment from four distinct locations along the long profile Kubanni River which were analyzed using Instrumental Nitrogen Activities Analysis (INAA techniques. The Nigerian Research Reactor-1(NIRR-1 which is Miniature Nitrogen Source Reactor (MNSR was used to analyze the data. The result of the laboratory analysis was subjected to cluster analysis. The analysis shows a stable clustering system where the metal concentrations in the four different locations were grouped into two main groups with one outlier. The level of concentration of elements that were sampled in the dry months were cluster in group I and those collected in the raining months were in group II. This strongly support that there is temporal variation in the levels of concentration of metal contaminants between wet and dry seasons in river Kubanni and also confirms the fact that the elements that were collected in the wet season are from the same source and those in the dry season are also from common source.
Probing cosmology with weak lensing selected clusters II: Dark energy and f(R) gravity models
Shirasaki, Masato; Yoshida, Naoki
2015-01-01
Ongoing and future wide-field galaxy surveys can be used to locate a number of clusters of galaxies with cosmic shear measurement alone. We study constraints on cosmological models using statistics of weak lensing selected galaxy clusters. We extend our previous theoretical framework to model the statistical properties of clusters in variants of cosmological models as well as in the standard LCDM model. Weak lensing selection of clusters does not rely on the conventional assumption such as the relation between luminosity and mass and/or hydrostatic equilibrium, but a number of observational effects compromise robust identification. We use a large set of realistic mock weak-lensing catalogs as well as analytic models to perform a Fisher analysis and make forecast for constraining two competing cosmological models, wCDM model and f(R) model proposed by Hu & Sawicki, with our lensing statistics. We show that weak lensing selected clusters are excellent probe of cosmology when combined with cosmic shear power...
Point Cluster Analysis Using a 3D Voronoi Diagram with Applications in Point Cloud Segmentation
Directory of Open Access Journals (Sweden)
Shen Ying
2015-08-01
Full Text Available Three-dimensional (3D point analysis and visualization is one of the most effective methods of point cluster detection and segmentation in geospatial datasets. However, serious scattering and clotting characteristics interfere with the visual detection of 3D point clusters. To overcome this problem, this study proposes the use of 3D Voronoi diagrams to analyze and visualize 3D points instead of the original data item. The proposed algorithm computes the cluster of 3D points by applying a set of 3D Voronoi cells to describe and quantify 3D points. The decompositions of point cloud of 3D models are guided by the 3D Voronoi cell parameters. The parameter values are mapped from the Voronoi cells to 3D points to show the spatial pattern and relationships; thus, a 3D point cluster pattern can be highlighted and easily recognized. To capture different cluster patterns, continuous progressive clusters and segmentations are tested. The 3D spatial relationship is shown to facilitate cluster detection. Furthermore, the generated segmentations of real 3D data cases are exploited to demonstrate the feasibility of our approach in detecting different spatial clusters for continuous point cloud segmentation.
Application of microarray analysis on computer cluster and cloud platforms.
Bernau, C; Boulesteix, A-L; Knaus, J
2013-01-01
Analysis of recent high-dimensional biological data tends to be computationally intensive as many common approaches such as resampling or permutation tests require the basic statistical analysis to be repeated many times. A crucial advantage of these methods is that they can be easily parallelized due to the computational independence of the resampling or permutation iterations, which has induced many statistics departments to establish their own computer clusters. An alternative is to rent computing resources in the cloud, e.g. at Amazon Web Services. In this article we analyze whether a selection of statistical projects, recently implemented at our department, can be efficiently realized on these cloud resources. Moreover, we illustrate an opportunity to combine computer cluster and cloud resources. In order to compare the efficiency of computer cluster and cloud implementations and their respective parallelizations we use microarray analysis procedures and compare their runtimes on the different platforms. Amazon Web Services provide various instance types which meet the particular needs of the different statistical projects we analyzed in this paper. Moreover, the network capacity is sufficient and the parallelization is comparable in efficiency to standard computer cluster implementations. Our results suggest that many statistical projects can be efficiently realized on cloud resources. It is important to mention, however, that workflows can change substantially as a result of a shift from computer cluster to cloud computing.
Single-cluster-update Monte Carlo method for the random anisotropy model
Rößler, U. K.
1999-06-01
A Wolff-type cluster Monte Carlo algorithm for random magnetic models is presented. The algorithm is demonstrated to reduce significantly the critical slowing down for planar random anisotropy models with weak anisotropy strength. Dynamic exponents zcluster algorithms are estimated for models with ratio of anisotropy to exchange constant D/J=1.0 on cubic lattices in three dimensions. For these models, critical exponents are derived from a finite-size scaling analysis.
Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2006-01-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.
Efficient cluster algorithm for CP(N-1) models
Beard, B. B.; Pepe, M.; Riederer, S.; Wiese, U.-J.
2006-11-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z=0.
Topic Modeling Based Image Clustering by Events in Social Media
Directory of Open Access Journals (Sweden)
Bin Xu
2016-01-01
Full Text Available Social event detection in large photo collections is very challenging and multimodal clustering is an effective methodology to deal with the problem. Geographic information is important in event detection. This paper proposed a topic model based approach to estimate the missing geographic information for photos. The approach utilizes a supervised multimodal topic model to estimate the joint distribution of time, geographic, content, and attached textual information. Then we annotate the missing geographic photos with a predicted geographic coordinate. Experimental results indicate that the clustering performance improved by annotated geographic information.
Cluster model of social partnership in municipal education
Directory of Open Access Journals (Sweden)
Romanova Oksana
2016-03-01
Full Text Available This article discusses the model of educational clusters that are based on social interaction between educational institutions and public-private partnerships. Particular attention is paid to methods of creating such educational network, which allows not only to educational organizations to obtain the missing for the implementation of educational activities and resources to achieve certain educational outcomes, but also to meet the needs of customers of educational services. Different approaches to the formation of a model educational cluster, based on partnerships.
Sollima, A; Beccari, G; Pallanca, C
2016-01-01
We present the results of the analysis of deep photometric data for a sample of three Galactic globular clusters (NGC5466, NGC6218 and NGC6981) with the aim of estimating their degree of mass segregation and testing the predictions of analytic dynamical models. The adopted dataset, composed by both Hubble Space Telescope and ground based data, reaches the low-mass end of the mass functions of these clusters from the center up to their tidal radii allowing to derive their radial distribution of stars with different masses. All the analysed clusters show evidence of mass segregation with the most massive stars more concentrated than low-mass ones. The structures of NGC5466 and NGC6981 are well reproduced by multimass dynamical models adopting a lowered-Maxwellian distribution function and the prescription for mass segregation given by Gunn & Griffin (1979). Instead, NGC6218 appears to be more mass segregated than model predictions. By applying the same technique to mock observations derived from snapshots s...
Modeling and clustering users with evolving profiles in usage streams
Zhang, Chongsheng
2012-09-01
Today, there is an increasing need of data stream mining technology to discover important patterns on the fly. Existing data stream models and algorithms commonly assume that users\\' records or profiles in data streams will not be updated or revised once they arrive. Nevertheless, in various applications such asWeb usage, the records/profiles of the users can evolve along time. This kind of streaming data evolves in two forms, the streaming of tuples or transactions as in the case of traditional data streams, and more importantly, the evolving of user records/profiles inside the streams. Such data streams bring difficulties on modeling and clustering for exploring users\\' behaviors. In this paper, we propose three models to summarize this kind of data streams, which are the batch model, the Evolving Objects (EO) model and the Dynamic Data Stream (DDS) model. Through creating, updating and deleting user profiles, these models summarize the behaviors of each user as a profile object. Based upon these models, clustering algorithms are employed to discover interesting user groups from the profile objects. We have evaluated all the proposed models on a large real-world data set, showing that the DDS model summarizes the data streams with evolving tuples more efficiently and effectively, and provides better basis for clustering users than the other two models. © 2012 IEEE.
Examination of European Union economic cohesion: A cluster analysis approach
Directory of Open Access Journals (Sweden)
Jiri Mazurek
2014-01-01
Full Text Available In the past years majority of EU members experienced the highest economic decline in their modern history, but impacts of the global financial crisis were not distributed homogeneously across the continent. The aim of the paper is to examine a cohesion of European Union (plus Norway and Iceland in terms of an economic development of its members from the 1st of January 2008 to the 31st of December 2012. For the study five economic indicators were selected: GDP growth, unemployment, inflation, labour productivity and government debt. Annual data from Eurostat databases were averaged over the whole period and then used as an input for a cluster analysis. It was found that EU countries were divided into six different clusters. The most populated cluster with 14 countries covered Central and West Europe and reflected relative homogeneity of this part of Europe. Countries of Southern Europe (Greece, Portugal and Spain shared their own cluster of the most affected countries by the recent crisis as well as the Baltics and the Balkans states in another cluster. On the other hand Slovakia and Poland, only two countries that escaped a recession, were classified in their own cluster of the most successful countries
Sun Protection Belief Clusters: Analysis of Amazon Mechanical Turk Data.
Santiago-Rivas, Marimer; Schnur, Julie B; Jandorf, Lina
2016-12-01
This study aimed (i) to determine whether people could be differentiated on the basis of their sun protection belief profiles and individual characteristics and (ii) explore the use of a crowdsourcing web service for the assessment of sun protection beliefs. A sample of 500 adults completed an online survey of sun protection belief items using Amazon Mechanical Turk. A two-phased cluster analysis (i.e., hierarchical and non-hierarchical K-means) was utilized to determine clusters of sun protection barriers and facilitators. Results yielded three distinct clusters of sun protection barriers and three distinct clusters of sun protection facilitators. Significant associations between gender, age, sun sensitivity, and cluster membership were identified. Results also showed an association between barrier and facilitator cluster membership. The results of this study provided a potential alternative approach to developing future sun protection promotion initiatives in the population. Findings add to our knowledge regarding individuals who support, oppose, or are ambivalent toward sun protection and inform intervention research by identifying distinct subtypes that may best benefit from (or have a higher need for) skin cancer prevention efforts.
Semi-Analytic Model Predictions of the Galaxy Population in Proto-clusters
Contini, E; Hatch, N; Borgani, S; Kang, X
2015-01-01
We investigate the galaxy population in simulated proto-cluster regions using a semi-analytic model of galaxy formation, coupled to merger trees extracted from N-body simulations. We select the most massive clusters at redshift $z=0$ from our set of simulations, and follow their main progenitors back in time. The analysis shows that proto-cluster regions are dominated by central galaxies and their number decreases with time as many become satellites, clustering around the central object. In agreement with observations, we find an increasing velocity dispersion with cosmic time, the increase being faster for satellites. The analysis shows that proto-clusters are very extended regions, $\\gtrsim 20 \\, Mpc$ at $z \\gtrsim 1$. The fraction of galaxies in proto-cluster regions that are not progenitor of cluster galaxies varies with redshift, stellar mass and area considered. It is about 20-30 per cent for galaxies with stellar mass $\\sim 10^9\\,{\\rm M}_{\\sun}$, while negligible for the most massive galaxies considere...
Bivariate Mixed Effects Analysis of Clustered Data with Large Cluster Sizes.
Zhang, Daowen; Sun, Jie Lena; Pieper, Karen
2016-10-01
Linear mixed effects models are widely used to analyze a clustered response variable. Motivated by a recent study to examine and compare the hospital length of stay (LOS) between patients undertaking percutaneous coronary intervention (PCI) and coronary artery bypass graft (CABG) from several international clinical trials, we proposed a bivariate linear mixed effects model for the joint modeling of clustered PCI and CABG LOS's where each clinical trial is considered a cluster. Due to the large number of patients in some trials, commonly used commercial statistical software for fitting (bivariate) linear mixed models failed to run since it could not allocate enough memory to invert large dimensional matrices during the optimization process. We consider ways to circumvent the computational problem in the maximum likelihood (ML) inference and restricted maximum likelihood (REML) inference. Particularly, we developed an expected and maximization (EM) algorithm for the REML inference and presented an ML implementation using existing software. The new REML EM algorithm is easy to implement and computationally stable and efficient. With this REML EM algorithm, we could analyze the LOS data and obtained meaningful results.
A Geometric Analysis of Subspace Clustering with Outliers
Soltanolkotabi, Mahdi
2011-01-01
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named {\\em sparse subspace clustering} (SSC) \\cite{Elhamifar09}, which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretica...
Cluster analysis of knowledge sources in standardized electrical engineering subfields
Directory of Open Access Journals (Sweden)
Blagojević Marija
2016-01-01
Full Text Available The paper presents a cluster analysis of innovation of knowledge sources based on the standards in the field of Electrical Engineering. Both local (SRPS and global (ISO knowledge sources have been analysed with the aim of innovating a Knowledge Base (KB. The results presented indicate a means/possibility of grouping the subfields within a cluster. They also point to a trend or intensity of knowledge source innovation for the purpose of innovating the KB that accompanies innovations. The study provides the possibility of predicting necessary financial resources in the forthcoming period by means of original mathematical relations. Furthermore, the cluster analysis facilitates the comparison of the innovation intensity in this and other (subfields. Future work relates to the monitoring of the knowledge source innovation by means of KB engineering and improvement of the methodology of prediction using neural networks.
Regional SAR Image Segmentation Based on Fuzzy Clustering with Gamma Mixture Model
Li, X. L.; Zhao, Q. H.; Li, Y.
2017-09-01
Most of stochastic based fuzzy clustering algorithms are pixel-based, which can not effectively overcome the inherent speckle noise in SAR images. In order to deal with the problem, a regional SAR image segmentation algorithm based on fuzzy clustering with Gamma mixture model is proposed in this paper. First, initialize some generating points randomly on the image, the image domain is divided into many sub-regions using Voronoi tessellation technique. Each sub-region is regarded as a homogeneous area in which the pixels share the same cluster label. Then, assume the probability of the pixel to be a Gamma mixture model with the parameters respecting to the cluster which the pixel belongs to. The negative logarithm of the probability represents the dissimilarity measure between the pixel and the cluster. The regional dissimilarity measure of one sub-region is defined as the sum of the measures of pixels in the region. Furthermore, the Markov Random Field (MRF) model is extended from pixels level to Voronoi sub-regions, and then the regional objective function is established under the framework of fuzzy clustering. The optimal segmentation results can be obtained by the solution of model parameters and generating points. Finally, the effectiveness of the proposed algorithm can be proved by the qualitative and quantitative analysis from the segmentation results of the simulated and real SAR images.
Cluster analysis of WIBS single-particle bioaerosol data
Robinson, N. H.; Allan, J. D.; Huffman, J. A.; Kaye, P. H.; Foot, V. E.; Gallagher, M.
2013-02-01
Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs). The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL) before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity) to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP) measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.
Cluster analysis of WIBS single-particle bioaerosol data
Directory of Open Access Journals (Sweden)
N. H. Robinson
2013-02-01
Full Text Available Hierarchical agglomerative cluster analysis was performed on single-particle multi-spatial data sets comprising optical diameter, asymmetry and three different fluorescence measurements, gathered using two dual Wideband Integrated Bioaerosol Sensors (WIBSs. The technique is demonstrated on measurements of various fluorescent and non-fluorescent polystyrene latex spheres (PSL before being applied to two separate contemporaneous ambient WIBS data sets recorded in a forest site in Colorado, USA, as part of the BEACHON-RoMBAS project. Cluster analysis results between both data sets are consistent. Clusters are tentatively interpreted by comparison of concentration time series and cluster average measurement values to the published literature (of which there is a paucity to represent the following: non-fluorescent accumulation mode aerosol; bacterial agglomerates; and fungal spores. To our knowledge, this is the first time cluster analysis has been applied to long-term online primary biological aerosol particle (PBAP measurements. The novel application of this clustering technique provides a means for routinely reducing WIBS data to discrete concentration time series which are more easily interpretable, without the need for any a priori assumptions concerning the expected aerosol types. It can reduce the level of subjectivity compared to the more standard analysis approaches, which are typically performed by simple inspection of various ensemble data products. It also has the advantage of potentially resolving less populous or subtly different particle types. This technique is likely to become more robust in the future as fluorescence-based aerosol instrumentation measurement precision, dynamic range and the number of available metrics are improved.
Joint Sequence Analysis: Association and Clustering
Piccarreta, Raffaella
2017-01-01
In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…
Cluster Analysis of Clinical Data Identifies Fibromyalgia Subgroups
Docampo, Elisa; Collado, Antonio; Escaramís, Geòrgia; Carbonell, Jordi; Rivera, Javier; Vidal, Javier; Alegre, José
2013-01-01
Introduction Fibromyalgia (FM) is mainly characterized by widespread pain and multiple accompanying symptoms, which hinder FM assessment and management. In order to reduce FM heterogeneity we classified clinical data into simplified dimensions that were used to define FM subgroups. Material and Methods 48 variables were evaluated in 1,446 Spanish FM cases fulfilling 1990 ACR FM criteria. A partitioning analysis was performed to find groups of variables similar to each other. Similarities between variables were identified and the variables were grouped into dimensions. This was performed in a subset of 559 patients, and cross-validated in the remaining 887 patients. For each sample and dimension, a composite index was obtained based on the weights of the variables included in the dimension. Finally, a clustering procedure was applied to the indexes, resulting in FM subgroups. Results Variables clustered into three independent dimensions: “symptomatology”, “comorbidities” and “clinical scales”. Only the two first dimensions were considered for the construction of FM subgroups. Resulting scores classified FM samples into three subgroups: low symptomatology and comorbidities (Cluster 1), high symptomatology and comorbidities (Cluster 2), and high symptomatology but low comorbidities (Cluster 3), showing differences in measures of disease severity. Conclusions We have identified three subgroups of FM samples in a large cohort of FM by clustering clinical data. Our analysis stresses the importance of family and personal history of FM comorbidities. Also, the resulting patient clusters could indicate different forms of the disease, relevant to future research, and might have an impact on clinical assessment. PMID:24098674
Transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis
Directory of Open Access Journals (Sweden)
Riccardi Giovanna
2009-03-01
Full Text Available Abstract Background The ESAT-6 (early secreted antigenic target, 6 kDa family collects small mycobacterial proteins secreted by Mycobacterium tuberculosis, particularly in the early phase of growth. There are 23 ESAT-6 family members in M. tuberculosis H37Rv. In a previous work, we identified the Zur- dependent regulation of five proteins of the ESAT-6/CFP-10 family (esxG, esxH, esxQ, esxR, and esxS. esxG and esxH are part of ESAT-6 cluster 3, whose expression was already known to be induced by iron starvation. Results In this research, we performed EMSA experiments and transcriptional analysis of ESAT-6 cluster 3 in Mycobacterium smegmatis (msmeg0615-msmeg0625 and M. tuberculosis. In contrast to what we had observed in M. tuberculosis, we found that in M. smegmatis ESAT-6 cluster 3 responds only to iron and not to zinc. In both organisms we identified an internal promoter, a finding which suggests the presence of two transcriptional units and, by consequence, a differential expression of cluster 3 genes. We compared the expression of msmeg0615 and msmeg0620 in different growth and stress conditions by means of relative quantitative PCR. The expression of msmeg0615 and msmeg0620 genes was essentially similar; they appeared to be repressed in most of the tested conditions, with the exception of acid stress (pH 4.2 where msmeg0615 was about 4-fold induced, while msmeg0620 was repressed. Analysis revealed that in acid stress conditions M. tuberculosis rv0282 gene was 3-fold induced too, while rv0287 induction was almost insignificant. Conclusion In contrast with what has been reported for M. tuberculosis, our results suggest that in M. smegmatis only IdeR-dependent regulation is retained, while zinc has no effect on gene expression. The role of cluster 3 in M. tuberculosis virulence is still to be defined; however, iron- and zinc-dependent expression strongly suggests that cluster 3 is highly expressed in the infective process, and that the cluster
Viriyangkura, Yuwadee
2014-01-01
Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…
Viriyangkura, Yuwadee
2014-01-01
Through a secondary analysis of statewide data from Colorado, people with intellectual and related developmental disabilities (ID/DD) were classified into five clusters based on their support needs characteristics using cluster analysis techniques. Prior latent factor models of support needs in the field of ID/DD were examined to investigate the…
Analyzing highway flow patterns using cluster analysis
Weijermars, Wendy; van Berkum, Eric C.; Pfliegl, R.
2005-01-01
Historical traffic patterns can be used for the prediction of traffic flows, as input for macroscopic traffic models, for the imputation of missing or erroneous data and as a basis for traffic management scenarios. This paper investigates the determination of historical traffic patterns by means of
A grand unified model for liganded gold clusters
Xu, Wen Wu; Zhu, Beien; Zeng, Xiao Cheng; Gao, Yi
2016-12-01
A grand unified model (GUM) is developed to achieve fundamental understanding of rich structures of all 71 liganded gold clusters reported to date. Inspired by the quark model by which composite particles (for example, protons and neutrons) are formed by combining three quarks (or flavours), here gold atoms are assigned three `flavours' (namely, bottom, middle and top) to represent three possible valence states. The `composite particles' in GUM are categorized into two groups: variants of triangular elementary block Au3(2e) and tetrahedral elementary block Au4(2e), all satisfying the duet rule (2e) of the valence shell, akin to the octet rule in general chemistry. The elementary blocks, when packed together, form the cores of liganded gold clusters. With the GUM, structures of 71 liganded gold clusters and their growth mechanism can be deciphered altogether. Although GUM is a predictive heuristic and may not be necessarily reflective of the actual electronic structure, several highly stable liganded gold clusters are predicted, thereby offering GUM-guided synthesis of liganded gold clusters by design.
Wang, Guofeng; Liu, Chang; Cui, Yinhu
2012-09-01
Feature extraction plays an important role in the clustering analysis. In this paper an integrated Autoregressive (AR)/Autoregressive Conditional Heteroscedasticity (ARCH) model is proposed to characterize the vibration signal and the model coefficients are adopted as feature vectors to realize clustering diagnosis of rolling element bearings. The main characteristic is that the AR item and ARCH item are interrelated with each other so that it can depict the excess kurtosis and volatility clustering information in the vibration signal more accurately in comparison with two-stage AR/ARCH model. To testify the correctness, four kinds of bearing signals are adopted for parametric modeling by using the integrated and two-stage AR/ARCH model. The variance analysis of the model coefficients shows that the integrated AR/ARCH model can get more concentrated distribution. Taking these coefficients as feature vectors, K means based clustering is utilized to realize the automatic classification of bearing fault status. The results show that the proposed method can get more accurate results in comparison with two-stage model and discrete wavelet decomposition.
Chambers, Lynda E.; Beaumont, Linda J.; Hudson, Irene L.
2014-08-01
There is substantial evidence of climate-related shifts to the timing of avian migration. Although spring arrival has generally advanced, variable species responses and geographical biases in data collection make it difficult to generalise patterns. We advance previous studies by using novel multivariate statistical techniques to explore complex relationships between phenological trends, climate indices and species traits. Using 145 datasets for 52 bird species, we assess trends in first arrival date (FAD), last departure date (LDD) and timing of peak abundance at multiple Australian locations. Strong seasonal patterns were found, i.e. spring phenological events were more likely to significantly advance, while significant advances and delays occurred in other seasons. However, across all significant trends, the magnitude of delays exceeded that of advances, particularly for FAD (+22.3 and -9.6 days/decade, respectively). Geographic variations were found, with greater advances in FAD and LDD, in south-eastern Australia than in the north and west. We identified four species clusters that differed with respect to species traits and climate drivers. Species within bird clusters responded in similar ways to local climate variables, particularly the number of raindays and rainfall. The strength of phenological trends was more strongly related to local climate variables than to broad-scale drivers (Southern Oscillation Index), highlighting the importance of precipitation as a driver of movement in Australian birds.
Characterization of population exposure to organochlorines: A cluster analysis application
R.M. Guimarães (Raphael Mendonça); S. Asmus (Sven); A. Burdorf (Alex)
2013-01-01
textabstractThis study aimed to show the results from a cluster analysis application in the characterization of population exposure to organochlorines through variables related to time and exposure dose. Characteristics of 354 subjects in a population exposed to organochlorine pesticides residues
Cluster analysis as a prediction tool for pregnancy outcomes.
Banjari, Ines; Kenjerić, Daniela; Šolić, Krešimir; Mandić, Milena L
2015-03-01
Considering specific physiology changes during gestation and thinking of pregnancy as a "critical window", classification of pregnant women at early pregnancy can be considered as crucial. The paper demonstrates the use of a method based on an approach from intelligent data mining, cluster analysis. Cluster analysis method is a statistical method which makes possible to group individuals based on sets of identifying variables. The method was chosen in order to determine possibility for classification of pregnant women at early pregnancy to analyze unknown correlations between different variables so that the certain outcomes could be predicted. 222 pregnant women from two general obstetric offices' were recruited. The main orient was set on characteristics of these pregnant women: their age, pre-pregnancy body mass index (BMI) and haemoglobin value. Cluster analysis gained a 94.1% classification accuracy rate with three branch- es or groups of pregnant women showing statistically significant correlations with pregnancy outcomes. The results are showing that pregnant women both of older age and higher pre-pregnancy BMI have a significantly higher incidence of delivering baby of higher birth weight but they gain significantly less weight during pregnancy. Their babies are also longer, and these women have significantly higher probability for complications during pregnancy (gestosis) and higher probability of induced or caesarean delivery. We can conclude that the cluster analysis method can appropriately classify pregnant women at early pregnancy to predict certain outcomes.
A Cluster Analysis of Personality Style in Adults with ADHD
Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita
2008-01-01
Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…
A Cluster Analysis of Personality Style in Adults with ADHD
Robin, Arthur L.; Tzelepis, Angela; Bedway, Marquita
2008-01-01
Objective: The purpose of this study was to use hierarchical linear cluster analysis to examine the normative personality styles of adults with ADHD. Method: A total of 311 adults with ADHD completed the Millon Index of Personality Styles, which consists of 24 scales assessing motivating aims, cognitive modes, and interpersonal behaviors. Results:…
Language Learner Motivational Types: A Cluster Analysis Study
Papi, Mostafa; Teimouri, Yasser
2014-01-01
The study aimed to identify different second language (L2) learner motivational types drawing on the framework of the L2 motivational self system. A total of 1,278 secondary school students learning English in Iran completed a questionnaire survey. Cluster analysis yielded five different groups based on the strength of different variables within…
Making Sense of Cluster Analysis: Revelations from Pakistani Science Classes
Pell, Tony; Hargreaves, Linda
2011-01-01
Cluster analysis has been applied to quantitative data in educational research over several decades and has been a feature of the Maurice Galton's research in primary and secondary classrooms. It has offered potentially useful insights for teaching yet its implications for practice are rarely implemented. It has been subject also to negative…
Frailty phenotypes in the elderly based on cluster analysis
DEFF Research Database (Denmark)
Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo
2012-01-01
genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...
Fault detection of flywheel system based on clustering and principal component analysis
Directory of Open Access Journals (Sweden)
Wang Rixin
2015-12-01
Full Text Available Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of “integrated power and attitude control” system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the relationship of parameters in each operation through the principal component analysis (PCA method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.
Fault detection of flywheel system based on clustering and principal component analysis
Institute of Scientific and Technical Information of China (English)
Wang Rixin; Gong Xuebing; Xu Minqiang; Li Yuqing
2015-01-01
Considering the nonlinear, multifunctional properties of double-flywheel with closed-loop control, a two-step method including clustering and principal component analysis is proposed to detect the two faults in the multifunctional flywheels. At the first step of the proposed algorithm, clustering is taken as feature recognition to check the instructions of‘‘integrated power and attitude control”system, such as attitude control, energy storage or energy discharge. These commands will ask the flywheel system to work in different operation modes. Therefore, the relationship of parameters in different operations can define the cluster structure of training data. Ordering points to identify the clustering structure (OPTICS) can automatically identify these clusters by the reachability-plot. K-means algorithm can divide the training data into the corresponding operations according to the reachability-plot. Finally, the last step of proposed model is used to define the rela-tionship of parameters in each operation through the principal component analysis (PCA) method. Compared with the PCA model, the proposed approach is capable of identifying the new clusters and learning the new behavior of incoming data. The simulation results show that it can effectively detect the faults in the multifunctional flywheels system.
Metal cluster fission: jellium model and Molecular dynamics simulations
DEFF Research Database (Denmark)
Lyalin, Andrey G.; Obolensky, Oleg I.; Solov'yov, Ilia;
2004-01-01
Fission of doubly charged sodium clusters is studied using the open-shell two-center deformed jellium model approximation and it ab initio molecular dynamic approach accounting for all electrons in the system. Results of calculations of fission reactions Na_10^2+ --> Na_7^+ + Na_3^+ and Na_18^2+ ...
A new efficient Cluster Algorithm for the Ising Model
Nyffeler, M; Wiese, U J; Nyfeler, Matthias; Pepe, Michele; Wiese, Uwe-Jens
2005-01-01
Using D-theory we construct a new efficient cluster algorithm for the Ising model. The construction is very different from the standard Swendsen-Wang algorithm and related to worm algorithms. With the new algorithm we have measured the correlation function with high precision over a surprisingly large number of orders of magnitude.
nIFTy galaxy cluster simulations II: radiative models
CSIR Research Space (South Africa)
Sembolini, F
2016-04-01
Full Text Available We have simulated the formation of a massive galaxy cluster (M(supcrit)(sub200) = 1.1×10(sup15)h(sup-1)M) in a CDM universe using 10 different codes (RAMSES, 2 incarnations of AREPO and 7 of GADGET), modeling hydrodynamics with full radiative...
Analytical model for non-thermal pressure in galaxy clusters
Shi, Xun; Komatsu, Eiichiro
2014-07-01
Non-thermal pressure in the intracluster gas has been found ubiquitously in numerical simulations, and observed indirectly. In this paper we develop an analytical model for intracluster non-thermal pressure in the virial region of relaxed clusters. We write down and solve a first-order differential equation describing the evolution of non-thermal velocity dispersion. This equation is based on insights gained from observations, numerical simulations, and theory of turbulence. The non-thermal energy is sourced, in a self-similar fashion, by the mass growth of clusters via mergers and accretion, and dissipates with a time-scale determined by the turnover time of the largest turbulence eddies. Our model predicts a radial profile of non-thermal pressure for relaxed clusters. The non-thermal fraction increases with radius, redshift, and cluster mass, in agreement with numerical simulations. The radial dependence is due to a rapid increase of the dissipation time-scale with radii, and the mass and redshift dependence comes from the mass growth history. Combing our model for the non-thermal fraction with the Komatsu-Seljak model for the total pressure, we obtain thermal pressure profiles, and compute the hydrostatic mass bias. We find typically 10 per cent bias for the hydrostatic mass enclosed within r500.
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
Directory of Open Access Journals (Sweden)
Laura Flight
2016-11-01
Full Text Available Abstract Background In an individually randomised controlled trial where the treatment is delivered by a health professional it seems likely that the effectiveness of the treatment, independent of any treatment effect, could depend on the skill, training or even enthusiasm of the health professional delivering it. This may then lead to a potential clustering of the outcomes for patients treated by the same health professional, but similar clustering may not occur in the control arm. Using four case studies, we aim to provide practical guidance and recommendations for the analysis of trials with some element of clustering in one arm. Methods Five approaches to the analysis of outcomes from an individually randomised controlled trial with clustering in one arm are identified in the literature. Some of these methods are applied to four case studies of completed randomised controlled trials with clustering in one arm with sample sizes ranging from 56 to 539. Results are obtained using the statistical packages R and Stata and summarised using a forest plot. Results The intra-cluster correlation coefficient (ICC for each of the case studies was small (<0.05 indicating little dependence on the outcomes related to cluster allocations. All models fitted produced similar results, including the simplest approach of ignoring clustering for the case studies considered. Conclusions A partially clustered approach, modelling the clustering in just one arm, most accurately represents the trial design and provides valid results. Modelling homogeneous variances between the clustered and unclustered arm is adequate in scenarios similar to the case studies considered. We recommend treating each participant in the unclustered arm as a single cluster. This approach is simple to implement in R and Stata and is recommended for the analysis of trials with clustering in one arm only. However, the case studies considered had small ICC values, limiting the generalisability
Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation
Directory of Open Access Journals (Sweden)
Tushar H Jaware
2013-10-01
Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.
Diffusion maps, clustering and fuzzy Markov modeling in peptide folding transitions
Energy Technology Data Exchange (ETDEWEB)
Nedialkova, Lilia V.; Amat, Miguel A. [Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544 (United States); Kevrekidis, Ioannis G., E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de [Department of Chemical and Biological Engineering and Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544 (United States); Hummer, Gerhard, E-mail: yannis@princeton.edu, E-mail: gerhard.hummer@biophys.mpg.de [Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Max-von-Laue-Str. 3, 60438 Frankfurt am Main (Germany)
2014-09-21
Using the helix-coil transitions of alanine pentapeptide as an illustrative example, we demonstrate the use of diffusion maps in the analysis of molecular dynamics simulation trajectories. Diffusion maps and other nonlinear data-mining techniques provide powerful tools to visualize the distribution of structures in conformation space. The resulting low-dimensional representations help in partitioning conformation space, and in constructing Markov state models that capture the conformational dynamics. In an initial step, we use diffusion maps to reduce the dimensionality of the conformational dynamics of Ala5. The resulting pretreated data are then used in a clustering step. The identified clusters show excellent overlap with clusters obtained previously by using the backbone dihedral angles as input, with small—but nontrivial—differences reflecting torsional degrees of freedom ignored in the earlier approach. We then construct a Markov state model describing the conformational dynamics in terms of a discrete-time random walk between the clusters. We show that by combining fuzzy C-means clustering with a transition-based assignment of states, we can construct robust Markov state models. This state-assignment procedure suppresses short-time memory effects that result from the non-Markovianity of the dynamics projected onto the space of clusters. In a comparison with previous work, we demonstrate how manifold learning techniques may complement and enhance informed intuition commonly used to construct reduced descriptions of the dynamics in molecular conformation space.
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei
2013-01-01
This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…
Improving Cluster Analysis with Automatic Variable Selection Based on Trees
2014-12-01
ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address
Feature-space clustering for fMRI meta-analysis
DEFF Research Database (Denmark)
Goutte, C.; Hansen, L.K.; Liptrot, Matthew George
2001-01-01
Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability...... of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields f......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular...
Feature-space clustering for fMRI meta-analysis
DEFF Research Database (Denmark)
Goutte, C.; Hansen, L.K.; Liptrot, Matthew George
2001-01-01
Clustering functional magnetic resonance imaging (fMRI) time series has emerged in recent years as a possible alternative to parametric modeling approaches. Most of the work so far has been concerned with clustering raw time series. In this contribution we investigate the applicability...... of a clustering method applied to features extracted from the data. This approach is extremely versatile and encompasses previously published results [Goutte et al., 1999] as special cases. A typical application is in data reduction: as the increase in temporal resolution of fMRI experiments routinely yields f......-voxel analyses. In particular this allows the checking of the differences and agreements between different methods of analysis. Both approaches are illustrated on a fMRI data set involving visual stimulation, and we show that the feature space clustering approach yields nontrivial results and, in particular...
MEME-LaB: motif analysis in clusters.
Brown, Paul; Baxter, Laura; Hickman, Richard; Beynon, Jim; Moore, Jonathan D; Ott, Sascha
2013-07-01
Genome-wide expression analysis can result in large numbers of clusters of co-expressed genes. Although there are tools for ab initio discovery of transcription factor-binding sites, most do not provide a quick and easy way to study large numbers of clusters. To address this, we introduce a web tool called MEME-LaB. The tool wraps MEME (an ab initio motif finder), providing an interface for users to input multiple gene clusters, retrieve promoter sequences, run motif finding and then easily browse and condense the results, facilitating better interpretation of the results from large-scale datasets. MEME-LaB is freely accessible at: http://wsbc.warwick.ac.uk/wsbcToolsWebpage/. Supplementary data are available at Bioinformatics online.
Cognitive analysis of multiple sclerosis utilizing fuzzy cluster means
Directory of Open Access Journals (Sweden)
Imianvan Anthony Agboizebeta
2012-02-01
Full Text Available Multiple sclerosis, often called MS, is a disease that affects the central nervous system (the brain andspinal cord. Myelin provides insulation for nerve cells improves the conduction of impulses along thenerves and is important for maintaining the health of the nerves. In multiple sclerosis, inflammationcauses the myelin to disappear. Genetic factors, environmental issues and viral infection may alsoplay a role in developing the disease. Ms is characterized by life threatening symptoms such as; loss ofbalance, hearing problem and depression. The application of Fuzzy Cluster Means (FCM or Fuzzy CMeananalysis to the diagnosis of different forms of multiple sclerosis is the focal point of this paper.Application of cluster analysis involves a sequence of methodological and analytical decision stepsthat enhances the quality and meaning of the clusters produced. Uncertainties associated withanalysis of multiple sclerosis test data are eliminated by the system
Effects of tidal gravitational fields in clustering dark energy models
Pace, Francesco; Reischke, Robert; Meyer, Sven; Schäfer, Björn Malte
2017-04-01
We extend a previous work by Reischke et al. by studying the effects of tidal shear on clustering dark energy models within the framework of the extended spherical collapse model and using the Zel'dovich approximation. As in previous works on clustering dark energy, we assumed a vanishing effective sound speed describing the perturbations in dark energy models. To be self-consistent, our treatment is valid only on linear scales since we do not intend to introduce any heuristic models. This approach makes the linear overdensity δc mass dependent and similarly to the case of smooth dark energy, its effects are predominant at small masses and redshifts. Tidal shear has effects of the order of per cent or less, regardless of the model and preserves a well-known feature of clustering dark energy: When dark energy perturbations are included, the models resemble better the Lambda cold dark matter evolution of perturbations. We also showed that effects on the comoving number density of haloes are small and qualitatively and quantitatively in agreement with what were previously found for smooth dark energy models.
Concept Association and Hierarchical Hamming Clustering Model in Text Classification
Institute of Scientific and Technical Information of China (English)
Su Gui-yang; Li Jian-hua; Ma Ying-hua; Li Sheng-hong; Yin Zhong-hang
2004-01-01
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among keywords in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality.
Modelling autophagy selectivity by receptor clustering on peroxisomes
Brown, Aidan I
2016-01-01
When subcellular organelles are degraded by autophagy, typically some, but not all, of each targeted organelle type are degraded. Autophagy selectivity must not only select the correct type of organelle, but must discriminate between individual organelles of the same kind. In the context of peroxisomes, we use computational models to explore the hypothesis that physical clustering of autophagy receptor proteins on the surface of each organelle provides an appropriate all-or-none signal for degradation. The pexophagy receptor proteins NBR1 and p62 are well characterized, though only NBR1 is essential for pexophagy (Deosaran {\\em et al.}, 2013). Extending earlier work by addressing the initial nucleation of NBR1 clusters on individual peroxisomes, we find that larger peroxisomes nucleate NBR1 clusters first and lose them due to competitive coarsening last, resulting in significant size-selectivity favouring large peroxisomes. This effect can explain the increased catalase signal that results from experimental s...
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
Energy Technology Data Exchange (ETDEWEB)
Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)
2015-10-15
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing
α-α folding cluster model for α-radioactivity
Soylu, A.; Bayrak, O.
2015-04-01
The -decay half-lives are calculated for heavy and superheavy nuclei for and from the ground state to ground state transitions within the framework of the Wentzel-Kramers-Brillouin (WKB) method and the Bohr-Sommerfeld quantization. In the calculations, the - single folding cluster potential obtained with the folded integral of the - potential with the -cluster density distributions is used in order to model the nuclear interaction between the -particle and core nucleus. While the results show very good agreement with the experimental ones in the heavy-nuclei region, especially for even-even nuclei, smaller values than the experimental ones are obtained for superheavy nuclei. As both the density of the core and the interaction term in the folding integral include the -clustering effects and, in this way, all cluster effects are taken into account in the model, the results of calculations are more physical and reasonable than the calculations done in the other models. The present method could be applied to light nuclei with different types of nuclear densities.
α-α folding cluster model for α-radioactivity
Energy Technology Data Exchange (ETDEWEB)
Soylu, A. [Nigde University, Department of Physics, Nigde (Turkey); Bayrak, O. [Akdeniz University, Department of Physics, Antalya (Turkey)
2015-04-01
The α-decay half-lives are calculated for heavy and superheavy nuclei for 52 ≤ Z ≤ 112 and 108 ≤ A ≤ 285 from the ground state to ground state α transitions within the framework of the Wentzel-Kramers-Brillouin (WKB) method and the Bohr-Sommerfeld quantization. In the calculations, the α-α single folding cluster potential obtained with the folded integral of the α-α potential with the α-cluster density distributions is used in order to model the nuclear interaction between the α-particle and core nucleus. While the results show very good agreement with the experimental ones in the heavy-nuclei region, especially for even-even nuclei, smaller values than the experimental ones are obtained for superheavy nuclei. As both the density of the core and the interaction term in the folding integral include the α-clustering effects and, in this way, all cluster effects are taken into account in the model, the results of calculations are more physical and reasonable than the calculations done in the other models. The present method could be applied to light nuclei with different types of nuclear densities. (orig.)
Advances in Bayesian Model Based Clustering Using Particle Learning
Energy Technology Data Exchange (ETDEWEB)
Merl, D M
2009-11-19
Recent work by Carvalho, Johannes, Lopes and Polson and Carvalho, Lopes, Polson and Taddy introduced a sequential Monte Carlo (SMC) alternative to traditional iterative Monte Carlo strategies (e.g. MCMC and EM) for Bayesian inference for a large class of dynamic models. The basis of SMC techniques involves representing the underlying inference problem as one of state space estimation, thus giving way to inference via particle filtering. The key insight of Carvalho et al was to construct the sequence of filtering distributions so as to make use of the posterior predictive distribution of the observable, a distribution usually only accessible in certain Bayesian settings. Access to this distribution allows a reversal of the usual propagate and resample steps characteristic of many SMC methods, thereby alleviating to a large extent many problems associated with particle degeneration. Furthermore, Carvalho et al point out that for many conjugate models the posterior distribution of the static variables can be parametrized in terms of [recursively defined] sufficient statistics of the previously observed data. For models where such sufficient statistics exist, particle learning as it is being called, is especially well suited for the analysis of streaming data do to the relative invariance of its algorithmic complexity with the number of data observations. Through a particle learning approach, a statistical model can be fit to data as the data is arriving, allowing at any instant during the observation process direct quantification of uncertainty surrounding underlying model parameters. Here we describe the use of a particle learning approach for fitting a standard Bayesian semiparametric mixture model as described in Carvalho, Lopes, Polson and Taddy. In Section 2 we briefly review the previously presented particle learning algorithm for the case of a Dirichlet process mixture of multivariate normals. In Section 3 we describe several novel extensions to the original
Application of cluster analysis to preventive maintenance scheme design of pavement
Institute of Scientific and Technical Information of China (English)
ZENG Feng; ZHANG Xiao-ning
2009-01-01
To quantitatively identify the maintenance demand for each highway segments in the pavement main-tenance scheme design, a mathematical model of uniform segment division was established and an approach of applying cluster analysis theory to the uniform segment division and evaluation of pavement maintenance demand was proposed.The actual maintenance project of a highway carried out in Guangdong province was cited as an example to demonstrate the validity of the proposed method.It is proved that the cluster analysis can eliminate human factors in classification without being constrained by the quantities of samples, considering muhiple pavement distress indexes and the continuity of samples.Thus it is evident that cluster analysis is an efficient analytical tool in uniform segment division and evaluation of maintenance demand.
Topic Modeling Based Image Clustering by Events in Social Media
2016-01-01
Social event detection in large photo collections is very challenging and multimodal clustering is an effective methodology to deal with the problem. Geographic information is important in event detection. This paper proposed a topic model based approach to estimate the missing geographic information for photos. The approach utilizes a supervised multimodal topic model to estimate the joint distribution of time, geographic, content, and attached textual information. Then we annotate the missi...
Thatcher, W. R.; Savage, J. C.; Simpson, R.
2012-12-01
Regional Global Positioning System (GPS) velocity observations are providing increasingly precise mappings of actively deforming continental lithosphere. Cluster analysis, a venerable data analysis method, offers a simple, visual exploratory tool for the initial organization and investigation of GPS velocities (Simpson et al., 2012 GRL). Here we describe the application of cluster analysis to GPS velocities from three regions, the Mojave Desert and the San Francisco Bay regions in California, and the Aegean in the eastern Mediterranean. Our goal is to illustrate the strengths and shortcomings of the method in searching for spatially coherent patterns of deformation, including evidence for and against block-like behavior in these 3 regions. The deformation fields from dense regional GPS networks can often be concisely described in terms of relatively coherent blocks bounded by active faults, although the choice of blocks, their number and size, is subjective and usually guided by the distribution of known faults. Cluster analysis applied to GPS velocities provides a completely objective method for identifying groups of observations ranging in size from 10s to 100s of km in characteristic dimension based solely on the similarities of their velocity vectors. In the three regions we have studied, statistically significant clusters are almost invariably spatially coherent, fault bounded, and coincide with elastic, geologically identified structural blocks. Often, higher order clusters that are not statistically significant are also spatially coherent, suggesting the existence of additional blocks, or defining regions of other tectonic importance (e.g. zones of localized elastic strain accumulation near locked faults). These results can be used to both formulate tentative tectonic models with testable consequences and to suggest focused new measurements in under-sampled regions. Cluster analysis applied to GPS velocities has several potential limitations, aside from the
A Bayesian Analysis of the Ages of Four Open Clusters
Jeffery, Elizabeth J; van Dyk, David A; Stenning, David C; Robinson, Elliot; Stein, Nathan; Jefferys, W H
2016-01-01
In this paper we apply a Bayesian technique to determine the best fit of stellar evolution models to find the main sequence turn off age and other cluster parameters of four intermediate-age open clusters: NGC 2360, NGC 2477, NGC 2660, and NGC 3960. Our algorithm utilizes a Markov chain Monte Carlo technique to fit these various parameters, objectively finding the best-fit isochrone for each cluster. The result is a high-precision isochrone fit. We compare these results with the those of traditional "by-eye" isochrone fitting methods. By applying this Bayesian technique to NGC 2360, NGC 2477, NGC 2660, and NGC 3960, we determine the ages of these clusters to be 1.35 +/- 0.05, 1.02 +/- 0.02, 1.64 +/- 0.04, and 0.860 +/- 0.04 Gyr, respectively. The results of this paper continue our effort to determine cluster ages to higher precision than that offered by these traditional methods of isochrone fitting.
Model study in chemisorption: atomic hydrogen on beryllium clusters
Energy Technology Data Exchange (ETDEWEB)
Bauschlicher, C.W. Jr.
1976-08-01
The interaction between atomic hydrogen and the (0001) surface of Be metal has been studied by ab initio electronic structure theory. Self-consistent-field (SCF) calculations have been performed using minimum, optimized minimum, double zeta and mixed basis sets for clusters as large as 22 Be atoms. The binding energy and equilibrium geometry (the distance to the surface) were determined for 4 sites. Both spatially restricted (the wavefunction was constrained to transform as one of the irreducible representations of the molecular point group) and unrestricted SCF calculations were performed. Using only the optimized minimum basis set, clusters containing as many as 22 beryllium atoms have been investigated. From a variety of considerations, this cluster is seen to be nearly converged within the model used, providing the most reliable results for chemisorption. The site dependence of the frequency is shown to be a geometrical effect depending on the number and angle of the bonds. The diffusion of atomic hydrogen through a perfect beryllium crystal is predicted to be energetically unfavorable. The cohesive energy, the ionization energy and the singlet-triplet separation were computed for the clusters without hydrogen. These quantities can be seen as a measure of the total amount of edge effects. The chemisorptive properties are not related to the total amount of edge effects, but rather the edge effects felt by the adsorbate bonding berylliums. This lack of correlation with the total edge effects illustrates the local nature of the bonding, further strengthening the cluster model for chemisorption. A detailed discussion of the bonding and electronic structure is included. The remaining edge effects for the Be/sub 22/ cluster are discussed.
DEFF Research Database (Denmark)
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...
The Influence of Hepatitis C Virus Genetic Region on Phylogenetic Clustering Analysis.
Directory of Open Access Journals (Sweden)
François M J Lamoury
Full Text Available Sequencing is important for understanding the molecular epidemiology and viral evolution of hepatitis C virus (HCV infection. To date, there is little standardisation among sequencing protocols, in-part due to the high genetic diversity that is observed within HCV. This study aimed to develop a novel, practical sequencing protocol that covered both conserved and variable regions of the viral genome and assess the influence of each subregion, sequence concatenation and unrelated reference sequences on phylogenetic clustering analysis. The Core to the hypervariable region 1 (HVR1 of envelope-2 (E2 and non-structural-5B (NS5B regions of the HCV genome were amplified and sequenced from participants from the Australian Trial in Acute Hepatitis C (ATAHC, a prospective study of the natural history and treatment of recent HCV infection. Phylogenetic trees were constructed using a general time-reversible substitution model and sensitivity analyses were completed for every subregion. Pairwise distance, genetic distance and bootstrap support were computed to assess the impact of HCV region on clustering results as measured by the identification and percentage of participants falling within all clusters, cluster size, average patristic distance, and bootstrap value. The Robinson-Foulds metrics was also used to compare phylogenetic trees among the different HCV regions. Our results demonstrated that the genomic region of HCV analysed influenced phylogenetic tree topology and clustering results. The HCV Core region alone was not suitable for clustering analysis; NS5B concatenation, the inclusion of reference sequences and removal of HVR1 all influenced clustering outcome. The Core-E2 region, which represented the highest genetic diversity and longest sequence length in this study, provides an ideal method for clustering analysis to address a range of molecular epidemiological questions.
DGA Clustering and Analysis: Mastering Modern, Evolving Threats, DGALab
Directory of Open Access Journals (Sweden)
Alexander Chailytko
2016-05-01
Full Text Available Domain Generation Algorithms (DGA is a basic building block used in almost all modern malware. Malware researchers have attempted to tackle the DGA problem with various tools and techniques, with varying degrees of success. We present a complex solution to populate DGA feed using reversed DGAs, third-party feeds, and a smart DGA extraction and clustering based on emulation of a large number of samples. Smart DGA extraction requires no reverse engineering and works regardless of the DGA type or initialization vector, while enabling a cluster-based analysis. Our method also automatically allows analysis of the whole malware family, specific campaign, etc. We present our system and demonstrate its abilities on more than 20 malware families. This includes showing connections between different campaigns, as well as comparing results. Most importantly, we discuss how to utilize the outcome of the analysis to create smarter protections against similar malware.
Wagner-Kaiser, R; Robinson, E; von Hippel, T; Sarajedini, A; van Dyk, D A; Stein, N; Jefferys, W H
2016-01-01
We use Cycle 21 Hubble Space Telescope (HST) observations and HST archival ACS Treasury observations of Galactic Globular Clusters to find and characterize two stellar populations in NGC 5024 (M53), NGC 5272 (M3), and NGC 6352. For these three clusters, both single and double-population analyses are used to determine a best fit isochrone(s). We employ a sophisticated Bayesian analysis technique to simultaneously fit the cluster parameters (age, distance, absorption, and metallicity) that characterize each cluster. For the two-population analysis, unique population level helium values are also fit to each distinct population of the cluster and the relative proportions of the populations are determined. We find differences in helium ranging from $\\sim$0.05 to 0.11 for these three clusters. Model grids with solar $\\alpha$-element abundances ([$\\alpha$/Fe] =0.0) and enhanced $\\alpha$-elements ([$\\alpha$/Fe]=0.4) are adopted.
Energy Technology Data Exchange (ETDEWEB)
Krumholz, Mark R. [Department of Astronomy and Astrophysics, University of California, Santa Cruz, CA 95064 (United States); Adamo, Angela [Department of Astronomy, Oskar Klein Centre, Stockholm University, SE-10691 Stockholm (Sweden); Fumagalli, Michele [Institute for Computational Cosmology and Centre for Extragalactic Astronomy, Department of Physics, Durham University, South Road, Durham DH1 3LE (United Kingdom); Wofford, Aida [Institut d’Astrophysique de Paris, 98bis Boulevard Arago, F-75014 Paris (France); Calzetti, Daniela; Grasha, Kathryn [Department of Astronomy, University of Massachusetts–Amherst, Amherst, MA (United States); Lee, Janice C.; Whitmore, Bradley C.; Bright, Stacey N.; Ubeda, Leonardo [Space Telescope Science Institute, Baltimore, MD (United States); Gouliermis, Dimitrios A. [Centre for Astronomy, Institute for Theoretical Astrophysics, University of Heidelberg, Heidelberg (Germany); Kim, Hwihyun [Korea Astronomy and Space Science Institute, Daejeon (Korea, Republic of); Nair, Preethi [Department of Physics and Astronomy, University of Alabama, Tuscaloosa, AL (United States); Ryon, Jenna E. [Department of Astronomy, University of Wisconsin–Madison, Madison, WI (United States); Smith, Linda J. [European Space Agency/Space Telescope Science Institute, Baltimore, MD (United States); Thilker, David [Department of Physics and Astronomy, The Johns Hopkins University, Baltimore, MD (United States); Zackrisson, Erik, E-mail: mkrumhol@ucsc.edu, E-mail: adamo@astro.su.se [Department of Physics and Astronomy, Uppsala University, Uppsala (Sweden)
2015-10-20
We investigate a novel Bayesian analysis method, based on the Stochastically Lighting Up Galaxies (slug) code, to derive the masses, ages, and extinctions of star clusters from integrated light photometry. Unlike many analysis methods, slug correctly accounts for incomplete initial mass function (IMF) sampling, and returns full posterior probability distributions rather than simply probability maxima. We apply our technique to 621 visually confirmed clusters in two nearby galaxies, NGC 628 and NGC 7793, that are part of the Legacy Extragalactic UV Survey (LEGUS). LEGUS provides Hubble Space Telescope photometry in the NUV, U, B, V, and I bands. We analyze the sensitivity of the derived cluster properties to choices of prior probability distribution, evolutionary tracks, IMF, metallicity, treatment of nebular emission, and extinction curve. We find that slug's results for individual clusters are insensitive to most of these choices, but that the posterior probability distributions we derive are often quite broad, and sometimes multi-peaked and quite sensitive to the choice of priors. In contrast, the properties of the cluster population as a whole are relatively robust against all of these choices. We also compare our results from slug to those derived with a conventional non-stochastic fitting code, Yggdrasil. We show that slug's stochastic models are generally a better fit to the observations than the deterministic ones used by Yggdrasil. However, the overall properties of the cluster populations recovered by both codes are qualitatively similar.
Full text clustering and relationship network analysis of biomedical publications.
Directory of Open Access Journals (Sweden)
Renchu Guan
Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.
Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.
Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed
2015-11-01
Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.
Vastano, Valeria; Perrone, Filomena; Marasco, Rosangela; Sacco, Margherita; Muscariello, Lidia
2016-04-01
Exopolysaccharides (EPS) from lactic acid bacteria contribute to specific rheology and texture of fermented milk products and find applications also in non-dairy foods and in therapeutics. Recently, four clusters of genes (cps) associated with surface polysaccharide production have been identified in Lactobacillus plantarum WCFS1, a probiotic and food-associated lactobacillus. These clusters are involved in cell surface architecture and probably in release and/or exposure of immunomodulating bacterial molecules. Here we show a transcriptional analysis of these clusters. Indeed, RT-PCR experiments revealed that the cps loci are organized in five operons. Moreover, by reverse transcription-qPCR analysis performed on L. plantarum WCFS1 (wild type) and WCFS1-2 (ΔccpA), we demonstrated that expression of three cps clusters is under the control of the global regulator CcpA. These results, together with the identification of putative CcpA target sequences (catabolite responsive element CRE) in the regulatory region of four out of five transcriptional units, strongly suggest for the first time a role of the master regulator CcpA in EPS gene transcription among lactobacilli.
Three-Verb Clusters in Interference Frisian: A Stochastic Model over Sequential Syntactic Input.
Hoekstra, Eric; Versloot, Arjen
2016-03-01
Abstract Interference Frisian (IF) is a variety of Frisian, spoken by mostly younger speakers, which is heavily influenced by Dutch. IF exhibits all six logically possible word orders in a cluster of three verbs. This phenomenon has been researched by Koeneman and Postma (2006), who argue for a parameter theory, which leaves frequency differences between various orders unexplained. Rejecting Koeneman and Postma's parameter theory, but accepting their conclusion that Dutch (and Frisian) data are input for the grammar of IF, we will argue that the word order preferences of speakers of IF are determined by frequency and similarity. More specifically, three-verb clusters in IF are sensitive to: their linear left-to-right similarity to two-verb clusters and three-verb clusters in Frisian and in Dutch; the (estimated) frequency of two- and three-verb clusters in Frisian and Dutch. The model will be shown to work best if Dutch and Frisian, and two- and three-verb clusters, have equal impact factors. If different impact factors are taken, the model's predictions do not change substantially, testifying to its robustness. This analysis is in line with recent ideas that the sequential nature of human speech is more important to syntactic processes than commonly assumed, and that less burden need be put on the hierarchical dimension of syntactic structure.
Breakup reaction models for two- and three-cluster projectiles
Baye, D
2010-01-01
Breakup reactions are one of the main tools for the study of exotic nuclei, and in particular of their continuum. In order to get valuable information from measurements, a precise reaction model coupled to a fair description of the projectile is needed. We assume that the projectile initially possesses a cluster structure, which is revealed by the dissociation process. This structure is described by a few-body Hamiltonian involving effective forces between the clusters. Within this assumption, we review various reaction models. In semiclassical models, the projectile-target relative motion is described by a classical trajectory and the reaction properties are deduced by solving a time-dependent Schroedinger equation. We then describe the principle and variants of the eikonal approximation: the dynamical eikonal approximation, the standard eikonal approximation, and a corrected version avoiding Coulomb divergence. Finally, we present the continuum-discretized coupled-channel method (CDCC), in which the Schroed...
Critical dynamics of cluster algorithms in the dilute Ising model
Hennecke, M.; Heyken, U.
1993-08-01
Autocorrelation times for thermodynamic quantities at T C are calculated from Monte Carlo simulations of the site-diluted simple cubic Ising model, using the Swendsen-Wang and Wolff cluster algorithms. Our results show that for these algorithms the autocorrelation times decrease when reducing the concentration of magnetic sites from 100% down to 40%. This is of crucial importance when estimating static properties of the model, since the variances of these estimators increase with autocorrelation time. The dynamical critical exponents are calculated for both algorithms, observing pronounced finite-size effects in the energy autocorrelation data for the algorithm of Wolff. We conclude that, when applied to the dilute Ising model, cluster algorithms become even more effective than local algorithms, for which increasing autocorrelation times are expected.
Modeling the Formation of Globular Cluster Systems in the Virgo Cluster
Li, Hui
2014-01-01
Globular cluster (GC) systems are some of the oldest and most unique building blocks of galaxies. The mass and chemical composition of GCs preserve the fossil record of the early stages of formation of their host galaxies. The observed distribution of GC colors within massive early-type galaxies in the ACS Virgo Cluster Survey (ACSVCS) reveals a multi-modal shape, which likely corresponds to a multi-modal metallicity distribution. In this paper, we present a simple model for the formation and dynamical disruption of globular clusters that aims to match the ACSVCS data. We test the hypothesis that GCs are formed during major mergers of gas-rich galaxies and inherit the metallicity of their hosts. To trace merger events, we use halo merger trees extracted from a large cosmological N-body simulation. We select 20 halos in the mass range 2*10^{12}-7*10^{13} M_sun and match them to 18 Virgo galaxies with K-band luminosity between 3*10^{10} and 3*10^{11}L_sun. To set the Iron abundances, we use an empirical galaxy ...
Image modeling of compact starburst clusters: I. R136
Khorrami, Zeinab; Chesneau, Olivier
2016-01-01
Continuous progress in data quality from HST, recent multiwavelength high resolution spectroscopy and high contrast imaging from ground adaptive optics on large telescopes need modeling of R136 to understand its nature and evolutionary stage. To produce the best synthesized multiwavelength images of R136 we need to simulate the effect of dynamical and stellar evolution, mass segregation and binary stars fraction on the survival of young massive clusters with the initial parameters of R136 in the LMC, being set to the present knowledge of this famous cluster. We produced a series of 32 young massive clusters using the NBODY6 code. Each cluster was tracked with adequate temporal samples to follow the evolution of R136 during its early stages. To compare the NBODY6 simulations with observational data, we created the synthetic images from the output of the code. We used the TLUSTY and KURUCZ model atmospheres to produce the fluxes in HST/ WFPC2 filters. GENEVA isochrones were used to track the evolution of stars....
Subtyping demoralization in the medically ill by cluster analysis
Directory of Open Access Journals (Sweden)
Chiara Rafanelli
2013-03-01
Full Text Available Background and Objectives: There is increasing interest in the issue of demoralization, particularly in the setting of medical disease. The aim of this investigation was to use both DSM-IV comorbidity and the Diagnostic Criteria for Psychosomatic Research (DCPR in order to characterize demoralization in the medically ill. Methods: 1700 patients were recruited from 8 medical centers in the Italian Health System and 1560 agreed to participate. They all underwent a cross-sectional assessment with DSM-IV and DCPR structured interviews. 373 patients (23.9% received a diagnosis of demoralization. Data were submitted to cluster analysis. Results: Four clusters were identified: demoralization and comorbid depression; demoralization and comorbid somatoform/adjustment disorders; demoralization and comorbid anxiety; demoralization without any comorbid DSM disorder. The first cluster included 27.6% of the total sample and was characterized by the presence of DSM-IV mood disorders (mainly major depressive disorder. The second cluster had 18.2% of the cases and contained both DSM-IV somatoform (particularly, undifferentiated somatoform disorder and hypochondriasis and adjustment disorders. In the third cluster (24.7%, DSM-IV anxiety disorders in comorbidity with demoralization were predominant (particularly, generalized anxiety disorder, agoraphobia, panic disorder and obsessive-compulsive disorder. The fourth cluster had 29.5% of the patients and was characterized by the absence of any DSM-IV comorbid disorder. Conclusions: The findings indicate the need of expanding clinical assessment in the medically ill to include the various manifestations of demoralization as encompassed by the DCPR. Subtyping demoralization may yield improved targets for psychosomatic research and treatment trials.
Bayesian Analysis of Multiple Populations in Galactic Globular Clusters
Wagner-Kaiser, Rachel A.; Sarajedini, Ata; von Hippel, Ted; Stenning, David; Piotto, Giampaolo; Milone, Antonino; van Dyk, David A.; Robinson, Elliot; Stein, Nathan
2016-01-01
We use GO 13297 Cycle 21 Hubble Space Telescope (HST) observations and archival GO 10775 Cycle 14 HST ACS Treasury observations of Galactic Globular Clusters to find and characterize multiple stellar populations. Determining how globular clusters are able to create and retain enriched material to produce several generations of stars is key to understanding how these objects formed and how they have affected the structural, kinematic, and chemical evolution of the Milky Way. We employ a sophisticated Bayesian technique with an adaptive MCMC algorithm to simultaneously fit the age, distance, absorption, and metallicity for each cluster. At the same time, we also fit unique helium values to two distinct populations of the cluster and determine the relative proportions of those populations. Our unique numerical approach allows objective and precise analysis of these complicated clusters, providing posterior distribution functions for each parameter of interest. We use these results to gain a better understanding of multiple populations in these clusters and their role in the history of the Milky Way.Support for this work was provided by NASA through grant numbers HST-GO-10775 and HST-GO-13297 from the Space Telescope Science Institute, which is operated by AURA, Inc., under NASA contract NAS5-26555. This material is based upon work supported by the National Aeronautics and Space Administration under Grant NNX11AF34G issued through the Office of Space Science. This project was supported by the National Aeronautics & Space Administration through the University of Central Florida's NASA Florida Space Grant Consortium.
Strong Lensing Analysis of the Galaxy Cluster MACS J1319.9+7003 and the Discovery of a Shell Galaxy
Zitrin, Adi
2017-01-01
We present a strong-lensing (SL) analysis of the galaxy cluster MACS J1319.9+7003 (z = 0.33, also known as Abell 1722), as part of our ongoing effort to analyze massive clusters with archival Hubble Space Telescope (HST) imaging. We spectroscopically measured with Keck/Multi-Object Spectrometer For Infra-Red Exploration (MOSFIRE) two galaxies multiply imaged by the cluster. Our analysis reveals a modest lens, with an effective Einstein radius of {θ }e(z=2)=12+/- 1\\prime\\prime , enclosing 2.1+/- 0.3× {10}13 M⊙. We briefly discuss the SL properties of the cluster, using two different modeling techniques (see the text for details), and make the mass models publicly available (ftp://wise-ftp.tau.ac.il/pub/adiz/MACS1319/). Independently, we identified a noteworthy, young shell galaxy (SG) system forming around two likely interacting cluster members, 20″ north of the brightest cluster galaxy. SGs are rare in galaxy clusters, and indeed, a simple estimate reveals that they are only expected in roughly one in several dozen, to several hundred, massive galaxy clusters (the estimate can easily change by an order of magnitude within a reasonable range of characteristic values relevant for the calculation). Taking advantage of our lens model best-fit, mass-to-light scaling relation for cluster members, we infer that the total mass of the SG system is ∼ 1.3× {10}11 {M}ȯ , with a host-to-companion mass ratio of about 10:1. Despite being rare in high density environments, the SG constitutes an example to how stars of cluster galaxies are efficiently redistributed to the intra-cluster medium. Dedicated numerical simulations for the observed shell configuration, perhaps aided by the mass model, might cast interesting light on the interaction history and properties of the two galaxies. An archival HST search in galaxy cluster images can reveal more such systems.
Clustering of frequency spectrums from different bearing fault using principle component analysis
Directory of Open Access Journals (Sweden)
Yusof M.F.M.
2017-01-01
Full Text Available In studies associated with the defect in rolling element bearing, signal clustering are one of the popular approach taken in attempt to identify the type of defect. However, the noise interruption are one of the major issues which affect the degree of effectiveness of the applied clustering method. In this paper, the application of principle component analysis (PCA as a pre-processing method for hierarchical clustering analysis on the frequency spectrum of the vibration signal was proposed. To achieve the aim, the vibration signal was acquired from the operating bearings with different condition and speed. In the next stage, the principle component analysis was applied to the frequency spectrums of the acquired signals for pattern recognition purpose. Meanwhile the mahalanobis distance model was used to cluster the result from PCA. According to the results, it was found that the change in amplitude at the respective fundamental frequencies can be detected as a result from the application of PCA. Meanwhile, the application of mahalanobis distance was found to be suitable for clustering the results from principle component analysis. Uniquely, it was discovered that the spectrums from healthy and inner race defect bearing can be clearly distinguished from each other even though the change in amplitude pattern for inner race defect frequency spectrum was too small compared to the healthy one. In this work, it was demonstrated that the use of principle component analysis could sensitively detect the change in the pattern of the frequency spectrums. Likewise, the implementation of mahalanobis distance model for clustering purpose was found to be significant for bearing defect identification.
Efficient speaker verification using Gaussian mixture model component clustering.
Energy Technology Data Exchange (ETDEWEB)
De Leon, Phillip L. (New Mexico State University, Las Cruces, NM); McClanahan, Richard D.
2012-04-01
In speaker verification (SV) systems that employ a support vector machine (SVM) classifier to make decisions on a supervector derived from Gaussian mixture model (GMM) component mean vectors, a significant portion of the computational load is involved in the calculation of the a posteriori probability of the feature vectors of the speaker under test with respect to the individual component densities of the universal background model (UBM). Further, the calculation of the sufficient statistics for the weight, mean, and covariance parameters derived from these same feature vectors also contribute a substantial amount of processing load to the SV system. In this paper, we propose a method that utilizes clusters of GMM-UBM mixture component densities in order to reduce the computational load required. In the adaptation step we score the feature vectors against the clusters and calculate the a posteriori probabilities and update the statistics exclusively for mixture components belonging to appropriate clusters. Each cluster is a grouping of multivariate normal distributions and is modeled by a single multivariate distribution. As such, the set of multivariate normal distributions representing the different clusters also form a GMM. This GMM is referred to as a hash GMM which can be considered to a lower resolution representation of the GMM-UBM. The mapping that associates the components of the hash GMM with components of the original GMM-UBM is referred to as a shortlist. This research investigates various methods of clustering the components of the GMM-UBM and forming hash GMMs. Of five different methods that are presented one method, Gaussian mixture reduction as proposed by Runnall's, easily outperformed the other methods. This method of Gaussian reduction iteratively reduces the size of a GMM by successively merging pairs of component densities. Pairs are selected for merger by using a Kullback-Leibler based metric. Using Runnal's method of reduction, we
Bolin, Jocelyn H; Edwards, Julianne M; Finch, W Holmes; Cassady, Jerrell C
2014-01-01
Although traditional clustering methods (e.g., K-means) have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Directory of Open Access Journals (Sweden)
Jocelyn H Bolin
2014-04-01
Full Text Available Although traditional clustering methods (e.g., K-means have been shown to be useful in the social sciences it is often difficult for such methods to handle situations where clusters in the population overlap or are ambiguous. Fuzzy clustering, a method already recognized in many disciplines, provides a more flexible alternative to these traditional clustering methods. Fuzzy clustering differs from other traditional clustering methods in that it allows for a case to belong to multiple clusters simultaneously. Unfortunately, fuzzy clustering techniques remain relatively unused in the social and behavioral sciences. The purpose of this paper is to introduce fuzzy clustering to these audiences who are currently relatively unfamiliar with the technique. In order to demonstrate the advantages associated with this method, cluster solutions of a common perfectionism measure were created using both fuzzy clustering and K-means clustering, and the results compared. Results of these analyses reveal that different cluster solutions are found by the two methods, and the similarity between the different clustering solutions depends on the amount of cluster overlap allowed for in fuzzy clustering.
Directory of Open Access Journals (Sweden)
Houston John P
2008-07-01
Full Text Available Abstract Background Patients with acute mania respond differentially to treatment and, in many cases, fail to obtain or sustain symptom remission. The objective of this exploratory analysis was to characterize response in bipolar disorder by identifying groups of patients with similar manic symptom response profiles. Methods Patients (n = 222 were selected from a randomized, double-blind study of treatment with olanzapine or divalproex in bipolar I disorder, manic or mixed episode, with or without psychotic features. Hierarchical clustering based on Ward's distance was used to identify groups of patients based on Young-Mania Rating Scale (YMRS total scores at each of 5 assessments over 7 weeks. Logistic regression was used to identify baseline predictors for clusters of interest. Results Four distinct clusters of patients were identified: Cluster 1 (n = 64: patients did not maintain a response (YMRS total scores ≤ 12; Cluster 2 (n = 92: patients responded rapidly (within less than a week and response was maintained; Cluster 3 (n = 36: patients responded rapidly but relapsed soon afterwards (YMRS ≥ 15; Cluster 4 (n = 30: patients responded slowly (≥ 2 weeks and response was maintained. Predictive models using baseline variables found YMRS Item 10 (Appearance, and psychosis to be significant predictors for Clusters 1 and 4 vs. Clusters 2 and 3, but none of the baseline characteristics allowed discriminating between Clusters 1 vs. 4. Experiencing a mixed episode at baseline predicted membership in Clusters 2 and 3 vs. Clusters 1 and 4. Treatment with divalproex, larger number of previous manic episodes, lack of disruptive-aggressive behavior, and more prominent depressive symptoms at baseline were predictors for Cluster 3 vs. 2. Conclusion Distinct treatment response profiles can be predicted by clinical features at baseline. The presence of these features as potential risk factors for relapse in patients who have responded to treatment
Detection of Functional Change Using Cluster Trend Analysis in Glaucoma
Gardiner, Stuart K.; Mansberger, Steven L.; Demirel, Shaban
2017-01-01
Purpose Global analyses using mean deviation (MD) assess visual field progression, but can miss localized changes. Pointwise analyses are more sensitive to localized progression, but more variable so require confirmation. This study assessed whether cluster trend analysis, averaging information across subsets of locations, could improve progression detection. Methods A total of 133 test–retest eyes were tested 7 to 10 times. Rates of change and P values were calculated for possible re-orderings of these series to generate global analysis (“MD worsening faster than x dB/y with P trend analysis detects subsequently confirmed deterioration sooner than either global or pointwise analyses. PMID:28715580
Segment clustering methodology for unsupervised Holter recordings analysis
Rodríguez-Sotelo, Jose Luis; Peluffo-Ordoñez, Diego; Castellanos Dominguez, German
2015-01-01
Cardiac arrhythmia analysis on Holter recordings is an important issue in clinical settings, however such issue implicitly involves attending other problems related to the large amount of unlabelled data which means a high computational cost. In this work an unsupervised methodology based in a segment framework is presented, which consists of dividing the raw data into a balanced number of segments in order to identify fiducial points, characterize and cluster the heartbeats in each segment separately. The resulting clusters are merged or split according to an assumed criterion of homogeneity. This framework compensates the high computational cost employed in Holter analysis, being possible its implementation for further real time applications. The performance of the method is measure over the records from the MIT/BIH arrhythmia database and achieves high values of sensibility and specificity, taking advantage of database labels, for a broad kind of heartbeats types recommended by the AAMI.
Data Preprocessing in Cluster Analysis of Gene Expression
Institute of Scientific and Technical Information of China (English)
杨春梅; 万柏坤; 高晓峰
2003-01-01
Considering that the DNA microarray technology has generated explosive gene expression data and that it is urgent to analyse and to visualize such massive datasets with efficient methods, we investigate the data preprocessing methods used in cluster analysis, normalization or logarithm of the matrix, by using hierarchical clustering, principal component analysis (PCA) and self-organizing maps (SOMs). The results illustrate that when using the Euclidean distance as measuring metrics, logarithm of relative expression level is the best preprocessing method, while data preprocessed by normalization cannot attain the expected results because the data structure is ruined. If there are only a few principal components, the PCA is an effective method to extract the frame structure, while SOMs are more suitable for a specific structure.
Performance Analysis of a Cluster-Based MAC Protocol for Wireless Ad Hoc Networks
Directory of Open Access Journals (Sweden)
Jesús Alonso-Zárate
2010-01-01
Full Text Available An analytical model to evaluate the non-saturated performance of the Distributed Queuing Medium Access Control Protocol for Ad Hoc Networks (DQMANs in single-hop networks is presented in this paper. DQMAN is comprised of a spontaneous, temporary, and dynamic clustering mechanism integrated with a near-optimum distributed queuing Medium Access Control (MAC protocol. Clustering is executed in a distributed manner using a mechanism inspired by the Distributed Coordination Function (DCF of the IEEE 802.11. Once a station seizes the channel, it becomes the temporary clusterhead of a spontaneous cluster and it coordinates the peer-to-peer communications between the clustermembers. Within each cluster, a near-optimum distributed queuing MAC protocol is executed. The theoretical performance analysis of DQMAN in single-hop networks under non-saturation conditions is presented in this paper. The approach integrates the analysis of the clustering mechanism into the MAC layer model. Up to the knowledge of the authors, this approach is novel in the literature. In addition, the performance of an ad hoc network using DQMAN is compared to that obtained when using the DCF of the IEEE 802.11, as a benchmark reference.
Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method
DEFF Research Database (Denmark)
Andersen, Mikkel Meyer; Eriksen, Poul Svante; Morling, Niels
2014-01-01
method can be used for cluster analysis to further validate the discrete Laplace method. A very important practical fact is that the calculations can be performed on a normal computer. We identified two sub-clusters of the Eastern and Western European Y-STR haplotypes similar to results of previous...... studies. We also compared pairwise distances (between geographically separated samples) with those obtained using the AMOVA method and found good agreement. Further analyses that are impossible with AMOVA were made using the discrete Laplace method: analysis of the homogeneity in two different ways......The European Y-chromosomal short tandem repeat (STR) haplotype distribution has previously been analysed in various ways. Here, we introduce a new way of analysing population substructure using a new method based on clustering within the discrete Laplace exponential family that models...
An Interpretation of the Boshier-Collins Cluster Analysis Testing Houle's Typology.
Furst, Edward J.
1986-01-01
This article speculates on an underlying order obscured by the details of the Boshier-Collins cluster analysis and the mapping of Houle's types onto it. A table illustrates an interpretation of cluster analysis on Boshier's Education Participation Scale. (CT)
WHIM emission and the cluster soft excess: a model comparison
Mittaz, J; Cen, R; Bonamente, M
2004-01-01
The confirmation of the cluster soft excess (CSE) by XMM-Newton has rekindled interest as to its origin. The recent detections of CSE emission at large cluster radii together with reports of OVII line emission associated with the CSE has led many authors to conjecture that the CSE is, in fact, a signature of the warm-hot intergalactic medium (WHIM). In this paper we test the scenario by comparing the observed properties of the CSE with predictions based on models of the WHIM. We find that emission from the WHIM in current models is 3 to 4 orders of magnitude too faint to explain the CSE. We discuss different possibilities for this discrepancy including issues of simulation resolution and scale, and the role of small density enhancements or galaxy groups. Our final conclusion is that the WHIM alone is unlikely to be able to accout for the observed flux of the CSE.
An Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2005-01-01
We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.
Interloper treatment in dynamical modelling of galaxy clusters
Wojtak, R; Mamon, G A; Gottlöber, S; Prada, F; Moles, M; Wojtak, Radoslaw; Lokas, Ewa L.; Mamon, Gary A.; Gottloeber, Stefan; Prada, Francisco; Moles, Mariano
2006-01-01
The aim of this paper is to study the efficiency of different approaches to interloper treatment in dynamical modelling of galaxy clusters. Using cosmological N-body simulation of standard LCDM model we select 10 massive dark matter haloes and use their particles to emulate mock kinematic data in terms of projected galaxy positions and velocities as they would be measured by a distant observer. Taking advantage of the full 3D information available from the simulation we select samples of interlopers defined with different criteria. The interlopers thus selected provide means to assess the efficiency of different interloper removal schemes. We study direct methods of interloper removal based on dynamical or statistical restrictions imposed on ranges of positions and velocities available to cluster members. In determining these ranges we use either the velocity dispersion criterion or a maximum velocity profile. We find that the direct methods exclude on average 60-70 percent of unbound particles producing a sa...
Sensory over responsivity and obsessive compulsive symptoms: A cluster analysis.
Ben-Sasson, Ayelet; Podoly, Tamar Yonit
2017-02-01
Several studies have examined the sensory component in Obsesseive Compulsive Disorder (OCD) and described an OCD subtype which has a unique profile, and that Sensory Phenomena (SP) is a significant component of this subtype. SP has some commonalities with Sensory Over Responsivity (SOR) and might be in part a characteristic of this subtype. Although there are some studies that have examined SOR and its relation to Obsessive Compulsive Symptoms (OCS), literature lacks sufficient data on this interplay. First to further examine the correlations between OCS and SOR, and to explore the correlations between SOR modalities (i.e. smell, touch, etc.) and OCS subscales (i.e. washing, ordering, etc.). Second, to investigate the cluster analysis of SOR and OCS dimensions in adults, that is, to classify the sample using the sensory scores to find whether a sensory OCD subtype can be specified. Our third goal was to explore the psychometric features of a new sensory questionnaire: the Sensory Perception Quotient (SPQ). A sample of non clinical adults (n=350) was recruited via e-mail, social media and social networks. Participants completed questionnaires for measuring SOR, OCS, and anxiety. SOR and OCI-F scores were moderately significantly correlated (n=274), significant correlations between all SOR modalities and OCS subscales were found with no specific higher correlation between one modality to one OCS subscale. Cluster analysis revealed four distinct clusters: (1) No OC and SOR symptoms (NONE; n=100), (2) High OC and SOR symptoms (BOTH; n=28), (3) Moderate OC symptoms (OCS; n=63), (4) Moderate SOR symptoms (SOR; n=83). The BOTH cluster had significantly higher anxiety levels than the other clusters, and shared OC subscales scores with the OCS cluster. The BOTH cluster also reported higher SOR scores across tactile, vision, taste and olfactory modalities. The SPQ was found reliable and suitable to detect SOR, the sample SPQ scores was normally distributed (n=350). SOR is a
Clustering Multivariate Time Series Using Hidden Markov Models
Directory of Open Access Journals (Sweden)
Shima Ghassempour
2014-03-01
Full Text Available In this paper we describe an algorithm for clustering multivariate time series with variables taking both categorical and continuous values. Time series of this type are frequent in health care, where they represent the health trajectories of individuals. The problem is challenging because categorical variables make it difficult to define a meaningful distance between trajectories. We propose an approach based on Hidden Markov Models (HMMs, where we first map each trajectory into an HMM, then define a suitable distance between HMMs and finally proceed to cluster the HMMs with a method based on a distance matrix. We test our approach on a simulated, but realistic, data set of 1,255 trajectories of individuals of age 45 and over, on a synthetic validation set with known clustering structure, and on a smaller set of 268 trajectories extracted from the longitudinal Health and Retirement Survey. The proposed method can be implemented quite simply using standard packages in R and Matlab and may be a good candidate for solving the difficult problem of clustering multivariate time series with categorical variables using tools that do not require advanced statistic knowledge, and therefore are accessible to a wide range of researchers.
Bayesian network meta-analysis for cluster randomized trials with binary outcomes.
Uhlmann, Lorenz; Jensen, Katrin; Kieser, Meinhard
2017-06-01
Network meta-analysis is becoming a common approach to combine direct and indirect comparisons of several treatment arms. In recent research, there have been various developments and extensions of the standard methodology. Simultaneously, cluster randomized trials are experiencing an increased popularity, especially in the field of health services research, where, for example, medical practices are the units of randomization but the outcome is measured at the patient level. Combination of the results of cluster randomized trials is challenging. In this tutorial, we examine and compare different approaches for the incorporation of cluster randomized trials in a (network) meta-analysis. Furthermore, we provide practical insight on the implementation of the models. In simulation studies, it is shown that some of the examined approaches lead to unsatisfying results. However, there are alternatives which are suitable to combine cluster randomized trials in a network meta-analysis as they are unbiased and reach accurate coverage rates. In conclusion, the methodology can be extended in such a way that an adequate inclusion of the results obtained in cluster randomized trials becomes feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Coupled Two-Way Clustering Analysis of Gene Microarray Data
Getz, G; Domany, E
2000-01-01
We present a novel coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task: we present an algorithm, based on iterative clustering, which performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.
Coupled two-way clustering analysis of gene microarray data
Getz, Gad; Levine, Erel; Domany, Eytan
2000-10-01
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.
Creating Discriminative Models for Time Series Classification and Clustering by HMM Ensembles.
Asadi, Nazanin; Mirzaei, Abdolreza; Haghshenas, Ehsan
2016-12-01
Classification of temporal data sequences is a fundamental branch of machine learning with a broad range of real world applications. Since the dimensionality of temporal data is significantly larger than static data, and its modeling and interpreting is more complicated, performing classification and clustering on temporal data is more complex as well. Hidden Markov models (HMMs) are well-known statistical models for modeling and analysis of sequence data. Besides, ensemble methods, which employ multiple models to obtain the target model, revealed good performances in the conducted experiments. All these facts are a high level of motivation to employ HMM ensembles in the task of classification and clustering of time series data. So far, no effective classification and clustering method based on HMM ensembles has been proposed. Moreover, employing the limited existing HMM ensemble methods has trouble separating models of distinct classes as a vital task. In this paper, according to previous points a new framework based on HMM ensembles for classification and clustering is proposed. In addition to its strong theoretical background by employing the Rényi entropy for ensemble learning procedure, the main contribution of the proposed method is addressing HMM-based methods problem in separating models of distinct classes by considering the inverse emission matrix of the opposite class to build an opposite model. The proposed algorithms perform more effectively compared to other methods especially other HMM ensemble-based methods. Moreover, the proposed clustering framework, which derives benefits from both similarity-based and model-based methods, together with the Rényi-based ensemble method revealed its superiority in several measurements.
Cluster Dynamics Modeling with Bubble Nucleation, Growth and Coalescence
Energy Technology Data Exchange (ETDEWEB)
de Almeida, Valmor F. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Blondel, Sophie [Univ. of Tennessee, Knoxville, TN (United States); Bernholdt, David E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Wirth, Brian D. [Univ. of Tennessee, Knoxville, TN (United States)
2017-06-01
The topic of this communication pertains to defect formation in irradiated solids such as plasma-facing tungsten submitted to helium implantation in fusion reactor com- ponents, and nuclear fuel (metal and oxides) submitted to volatile ssion product generation in nuclear reactors. The purpose of this progress report is to describe ef- forts towards addressing the prediction of long-time evolution of defects via continuum cluster dynamics simulation. The di culties are twofold. First, realistic, long-time dynamics in reactor conditions leads to a non-dilute di usion regime which is not accommodated by the prevailing dilute, stressless cluster dynamics theory. Second, long-time dynamics calls for a large set of species (ideally an in nite set) to capture all possible emerging defects, and this represents a computational bottleneck. Extensions beyond the dilute limit is a signi cant undertaking since no model has been advanced to extend cluster dynamics to non-dilute, deformable conditions. Here our proposed approach to model the non-dilute limit is to monitor the appearance of a spatially localized void volume fraction in the solid matrix with a bell shape pro le and insert an explicit geometrical bubble onto the support of the bell function. The newly cre- ated internal moving boundary provides the means to account for the interfacial ux of mobile species into the bubble, and the growth of bubbles allows for coalescence phenomena which captures highly non-dilute interactions. We present a preliminary interfacial kinematic model with associated interfacial di usion transport to follow the evolution of the bubble in any number of spatial dimensions and any number of bubbles, which can be further extended to include a deformation theory. Finally we comment on a computational front-tracking method to be used in conjunction with conventional cluster dynamics simulations in the non-dilute model proposed.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Directory of Open Access Journals (Sweden)
Nan Lin
Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.
Modelling the average spectrum expected from a population of gamma-ray globular clusters
Venter, C
2015-01-01
Millisecond pulsars occur abundantly in globular clusters. They are expected to be responsible for several spectral components in the radio through gamma-ray waveband (e.g., involving synchrotron and inverse Compton emission), as have been seen by Radio Telescope Effelsberg, Chandra X-ray Observatory, Fermi Large Area Telescope, and the High Energy Stereoscopic System (H.E.S.S.) in the case of Terzan 5 (with fewer spectral components seen for other globular clusters). H.E.S.S. has recently performed a stacking analysis involving 15 non-detected globular clusters and obtained quite constraining average flux upper limits above 230 GeV. We present a model that assumes millisecond pulsars as sources of relativistic particles and predicts multi-wavelength emission from globular clusters. We apply this model to the population of clusters mentioned above to predict the average spectrum and compare this to the H.E.S.S. upper limits. Such comparison allows us to test whether the model is viable, leading to possible co...
Flannery, William Peter; Sneed, Carl D.; Marsh, Penny
2003-01-01
In this study we examined adolescent risk behaviors, giving special attention to suicide ideation. Cluster analysis was used to classify adolescents ( N = 2,730) on the Youth Risk Behavior Survey. Six clusters of adolescent risk behavior were identified. Although each risk cluster was distinct, some clusters shared overlapping risk behaviors.…
Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.
2015-03-01
Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
Analysis of risk factors for cluster behavior of dental implant failures.
Chrcanovic, Bruno Ramos; Kisch, Jenö; Albrektsson, Tomas; Wennerberg, Ann
2017-08-01
Some studies indicated that implant failures are commonly concentrated in few patients. To identify and analyze cluster behavior of dental implant failures among subjects of a retrospective study. This retrospective study included patients receiving at least three implants only. Patients presenting at least three implant failures were classified as presenting a cluster behavior. Univariate and multivariate logistic regression models and generalized estimating equations analysis evaluated the effect of explanatory variables on the cluster behavior. There were 1406 patients with three or more implants (8337 implants, 592 failures). Sixty-seven (4.77%) patients presented cluster behavior, with 56.8% of all implant failures. The intake of antidepressants and bruxism were identified as potential negative factors exerting a statistically significant influence on a cluster behavior at the patient-level. The negative factors at the implant-level were turned implants, short implants, poor bone quality, age of the patient, the intake of medicaments to reduce the acid gastric production, smoking, and bruxism. A cluster pattern among patients with implant failure is highly probable. Factors of interest as predictors for implant failures could be a number of systemic and local factors, although a direct causal relationship cannot be ascertained. © 2017 Wiley Periodicals, Inc.
ARABIC TEXT SUMMARIZATION BASED ON LATENT SEMANTIC ANALYSIS TO ENHANCE ARABIC DOCUMENTS CLUSTERING
Directory of Open Access Journals (Sweden)
Hanane Froud
2013-01-01
Full Text Available Arabic Documents Clustering is an important task for obtaining good results with the traditional Information Retrieval (IR systems especially with the rapid growth of the number of online documents present in Arabic language. Documents clustering aim to automatically group similar documents in one cluster using different similarity/distance measures. This task is often affected by the documents length, useful information on the documents is often accompanied by a large amount of noise, and therefore it is necessary to eliminate this noise while keeping useful information to boost the performance of Documents clustering. In this paper, we propose to evaluate the impact of text summarization using the Latent Semantic Analysis Model on Arabic Documents Clustering in order to solve problems cited above, using five similarity/distance measures: Euclidean Distance, Cosine Similarity, Jaccard Coefficient, Pearson Correlation Coefficient and Averaged Kullback-Leibler Divergence, for two times: without and with stemming. Our experimental results indicate that our proposed approach effectively solves the problems of noisy information and documents length, and thus significantly improve the clustering performance.
Lukyanov, V K; Zemlyanaya, E V; Spasova, K; Lukyanov, K V; Antonov, A N; Gaidarov, M K
2015-01-01
The density distributions of $^{10}$Be and $^{11}$Be nuclei obtained within the quantum Monte Carlo (QMC) model and the generator coordinate method (GCM) are used to calculate the microscopic optical potentials (OPs) and cross sections of elastic scattering of these nuclei on protons and $^{12}$C at energies $E<100$ MeV/nucleon. The real part of the OP is calculated using the folding model with the exchange terms included, while the imaginary part of the OP that reproduces the phase of scattering is obtained in the high-energy approximation (HEA). In this hybrid model of OP the free parameters are the depths of the real and imaginary parts obtained by fitting the experimental data. The well known energy dependence of the volume integrals is used as a physical constraint to resolve the ambiguities of the parameter values. The role of the spin-orbit potential and the surface contribution to the OP is studied for an adequate description of available experimental elastic scattering cross section data. Also, th...
Diagnostics of subtropical plants functional state by cluster analysis
Directory of Open Access Journals (Sweden)
Oksana Belous
2016-05-01
Full Text Available The article presents an application example of statistical methods for data analysis on diagnosis of the adaptive capacity of subtropical plants varieties. We depicted selection indicators and basic physiological parameters that were defined as diagnostic. We used evaluation on a set of parameters of water regime, there are: determination of water deficit of the leaves, determining the fractional composition of water and detection parameters of the concentration of cell sap (CCS (for tea culture flushes. These settings are characterized by high liability and high responsiveness to the effects of many abiotic factors that determined the particular care in the selection of plant material for analysis and consideration of the impact on sustainability. On the basis of the experimental data calculated the coefficients of pair correlation between climatic factors and used physiological indicators. The result was a selection of physiological and biochemical indicators proposed to assess the adaptability and included in the basis of methodical recommendations on diagnostics of the functional state of the studied cultures. Analysis of complex studies involving a large number of indicators is quite difficult, especially does not allow to quickly identify the similarity of new varieties for their adaptive responses to adverse factors, and, therefore, to set general requirements to conditions of cultivation. Use of cluster analysis suggests that in the analysis of only quantitative data; define a set of variables used to assess varieties (and the more sampling, the more accurate the clustering will happen, be sure to ascertain the measure of similarity (or difference between objects. It is shown that the identification of diagnostic features, which are subjected to statistical processing, impact the accuracy of the varieties classification. Selection in result of the mono-clusters analysis (variety tea Kolhida; hazelnut Lombardsky red; variety kiwi Monty
Franke, R.
2016-11-01
In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
Neuro-fuzzy system modeling based on automatic fuzzy clustering
Institute of Scientific and Technical Information of China (English)
Yuangang TANG; Fuchun SUN; Zengqi SUN
2005-01-01
A neuro-fuzzy system model based on automatic fuzzy clustering is proposed.A hybrid model identification algorithm is also developed to decide the model structure and model parameters.The algorithm mainly includes three parts:1) Automatic fuzzy C-means (AFCM),which is applied to generate fuzzy rules automatically,and then fix on the size of the neuro-fuzzy network,by which the complexity of system design is reducesd greatly at the price of the fitting capability;2) Recursive least square estimation (RLSE).It is used to update the parameters of Takagi-Sugeno model,which is employed to describe the behavior of the system;3) Gradient descent algorithm is also proposed for the fuzzy values according to the back propagation algorithm of neural network.Finally,modeling the dynamical equation of the two-link manipulator with the proposed approach is illustrated to validate the feasibility of the method.
STRUCTURAL MODELING OF INNOVATION CLUSTER INNOVATION CLUSTER’S INSTITUTIONAL ENVIRONMENT
Directory of Open Access Journals (Sweden)
D. L. Napolskikh
2012-01-01
Full Text Available The modern state of the problem of modeling the internal and external environment of the innovation cluster is considered. The proposed organizational model of interaction between the institutions of the cluster and the environment, as well as model and institutional infrastructure component of the cluster are offered. A hypothesis on the need for the organic model of the institutional environment of innovation cluster is offered.
Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach
Directory of Open Access Journals (Sweden)
Matúš Horváth
2012-10-01
Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.
Monitoring Customer Satisfaction in Service Industry: A Cluster Analysis Approach
Directory of Open Access Journals (Sweden)
Matúš Horváth
2012-11-01
Full Text Available One of the key performance indicators of quality management system of an organization is customer satisfaction. The process of monitoring customer satisfaction is therefore an important part of the measuring processes of the quality management system. This paper deals with new ways how to analyse and monitor customer satisfaction using the analysis of data containing how the customers use the organisation services and customer leaving rates. The article used cluster analysis in this process for segmentation of customers with the aim to increase the accuracy of the results and on these results based decisions. The aplication example was created as a part of bachelor thesis.
Using cluster analysis in measuring social domain of territorial brand
Directory of Open Access Journals (Sweden)
Zlata Stepanova
2009-10-01
Full Text Available Territorial brand has a social dimension reflected in the social equilibrium and measurable with social effectiveness indicators. The paper offers social effectiveness analysis of territory using investigation object “territorial and social systems (TSS” with their further classification according to social types based on cluster analysis. This method allows the authors to distinct four social types of TSS in Sverdlovsk region in accordance with such characteristics as financial activity, quality of life, social stability and ill-being levels. The results of investigation could be useful for brand policy of territorial authorities.
Wagstaff, Kiri L.
2012-03-01
On obtaining a new data set, the researcher is immediately faced with the challenge of obtaining a high-level understanding from the observations. What does a typical item look like? What are the dominant trends? How many distinct groups are included in the data set, and how is each one characterized? Which observable values are common, and which rarely occur? Which items stand out as anomalies or outliers from the rest of the data? This challenge is exacerbated by the steady growth in data set size [11] as new instruments push into new frontiers of parameter space, via improvements in temporal, spatial, and spectral resolution, or by the desire to "fuse" observations from different modalities and instruments into a larger-picture understanding of the same underlying phenomenon. Data clustering algorithms provide a variety of solutions for this task. They can generate summaries, locate outliers, compress data, identify dense or sparse regions of feature space, and build data models. It is useful to note up front that "clusters" in this context refer to groups of items within some descriptive feature space, not (necessarily) to "galaxy clusters" which are dense regions in physical space. The goal of this chapter is to survey a variety of data clustering methods, with an eye toward their applicability to astronomical data analysis. In addition to improving the individual researcher’s understanding of a given data set, clustering has led directly to scientific advances, such as the discovery of new subclasses of stars [14] and gamma-ray bursts (GRBs) [38]. All clustering algorithms seek to identify groups within a data set that reflect some observed, quantifiable structure. Clustering is traditionally an unsupervised approach to data analysis, in the sense that it operates without any direct guidance about which items should be assigned to which clusters. There has been a recent trend in the clustering literature toward supporting semisupervised or constrained
CLUSTERING ANALYSIS OF OFFICER'S BEHAVIOURS IN LONDON POLICE FOOT PATROL ACTIVITIES
Directory of Open Access Journals (Sweden)
J. Shen
2015-07-01
Full Text Available In this small paper we aim at presenting a framework of conceptual representation and clustering analysis of police officers’ patrol pattern obtained from mining their raw movement trajectory data. This have been achieved by a model developed to accounts for the spatio-temporal dynamics human movements by incorporating both the behaviour features of the travellers and the semantic meaning of the environment they are moving in. Hence, the similarity metric of traveller behaviours is jointly defined according to the stay time allocation in each Spatio-temporal region of interests (ST-ROI to support clustering analysis of patrol behaviours. The proposed framework enables the analysis of behaviour and preferences on higher level based on raw moment trajectories. The model is firstly applied to police patrol data provided by the Metropolitan Police and will be tested by other type of dataset afterwards.
von der Linden, Anja; Applegate, Douglas E; Kelly, Patrick L; Allen, Steven W; Ebeling, Harald; Burchat, Patricia R; Burke, David L; Donovan, David; Morris, R Glenn; Blandford, Roger; Erben, Thomas; Mantz, Adam
2012-01-01
This is the first in a series of papers in which we measure accurate weak-lensing masses for 51 of the most X-ray luminous galaxy clusters known at redshifts 0.15
Modelling clustering of vertically aligned carbon nanotube arrays
Schaber, Clemens F.; Filippov, Alexander E.; Heinlein, Thorsten; Schneider, Jörg J.; Gorb, Stanislav N.
2015-01-01
Previous research demonstrated that arrays of vertically aligned carbon nanotubes (VACNTs) exhibit strong frictional properties. Experiments indicated a strong decrease of the friction coefficient from the first to the second sliding cycle in repetitive measurements on the same VACNT spot, but stable values in consecutive cycles. VACNTs form clusters under shear applied during friction tests, and self-organization stabilizes the mechanical properties of the arrays. With increasing load in the range between 300 µN and 4 mN applied normally to the array surface during friction tests the size of the clusters increases, while the coefficient of friction decreases. To better understand the experimentally obtained results, we formulated and numerically studied a minimalistic model, which reproduces the main features of the system with a minimum of adjustable parameters. We calculate the van der Waals forces between the spherical friction probe and bunches of the arrays using the well-known Morse potential function to predict the number of clusters, their size, instantaneous and mean friction forces and the behaviour of the VACNTs during consecutive sliding cycles and at different normal loads. The data obtained by the model calculations coincide very well with the experimental data and can help in adapting VACNT arrays for biomimetic applications. PMID:26464787
Analytical model for non-thermal pressure in galaxy clusters
Shi, Xun
2014-01-01
Non-thermal pressure in the intracluster gas has been found ubiquitously in numerical simulations, and observed indirectly. In this paper we develop, for the first time, an analytical model for intracluster non-thermal pressure. We write down and solve a first-order differential equation describing the evolution of non-thermal velocity dispersion. This equation is based on insights gained from observations, numerical simulations, and theory of turbulence. The non-thermal energy is sourced, in a self-similar fashion, by the mass growth of clusters via mergers and accretion, and dissipates with a time scale determined by the turnover time of the largest turbulence eddies. Our model predicts a radial profile of non-thermal pressure for relaxed clusters. The non-thermal fraction increases with radius, redshift, and cluster mass, in agreement with numerical simulations. The radial dependence is due to a rapid increase of the dissipation time scale with radii, and the mass and redshift dependence comes from the mas...
A halo model for cosmological neutral hydrogen : abundances and clustering
Padmanabhan, Hamsa; Amara, Adam
2016-01-01
We extend the results of previous analyses towards constraining the abundance and clustering of post-reionization ($z \\sim 0-5$) neutral hydrogen (HI) systems using a halo model framework. We work with a comprehensive HI dataset including the small-scale clustering, column density and mass function of HI galaxies at low redshifts, intensity mapping measurements at intermediate redshifts and the UV/optical observations of Damped Lyman Alpha (DLA) systems at higher redshifts. We use a Markov Chain Monte Carlo (MCMC) approach to constrain the parameters of the best-fitting models, both for the HI-halo mass relation and the HI radial density profile. We find that a radial exponential profile results in a good fit to the low-redshift HI observations, including the clustering and the column density distribution. The form of the profile is also found to match the high-redshift DLA observations, when used in combination with a three-parameter HI-halo mass relation and a redshift evolution in the HI concentration. The...
nIFTy galaxy cluster simulations II: radiative models
Sembolini, Federico; Pearce, Frazer R; Power, Chris; Knebe, Alexander; Kay, Scott T; Cui, Weiguang; Yepes, Gustavo; Beck, Alexander M; Borgani, Stefano; Cunnama, Daniel; Davé, Romeel; February, Sean; Huang, Shuiyao; Katz, Neal; McCarthy, Ian G; Murante, Giuseppe; Newton, Richard D A; Perret, Valentin; Saro, Alexandro; Schaye, Joop; Teyssier, Romain
2015-01-01
We have simulated the formation of a massive galaxy cluster (M$_{200}^{\\rm crit}$ = 1.1$\\times$10$^{15}h^{-1}M_{\\odot}$) in a $\\Lambda$CDM universe using 10 different codes (RAMSES, 2 incarnations of AREPO and 7 of GADGET), modeling hydrodynamics with full radiative subgrid physics. These codes include Smoothed-Particle Hydrodynamics (SPH), spanning traditional and advanced SPH schemes, adaptive mesh and moving mesh codes. Our goal is to study the consistency between simulated clusters modeled with different radiative physical implementations - such as cooling, star formation and AGN feedback. We compare images of the cluster at $z=0$, global properties such as mass, and radial profiles of various dynamical and thermodynamical quantities. We find that, with respect to non-radiative simulations, dark matter is more centrally concentrated, the extent not simply depending on the presence/absence of AGN feedback. The scatter in global quantities is substantially higher than for non-radiative runs. Intriguingly, a...
Cluster Analysis and Fuzzy Query in Ship Maintenance and Design
Che, Jianhua; He, Qinming; Zhao, Yinggang; Qian, Feng; Chen, Qi
Cluster analysis and fuzzy query win wide-spread applications in modern intelligent information processing. In allusion to the features of ship maintenance data, a variant of hypergraph-based clustering algorithm, i.e., Correlation Coefficient-based Minimal Spanning Tree(CC-MST), is proposed to analyze the bulky data rooting in ship maintenance process, discovery the unknown rules and help ship maintainers make a decision on various device fault causes. At the same time, revising or renewing an existed design of ship or device maybe necessary to eliminate those device faults. For the sake of offering ship designers some valuable hints, a fuzzy query mechanism is designed to retrieve the useful information from large-scale complicated and reluctant ship technical and testing data. Finally, two experiments based on a real ship device fault statistical dataset validate the flexibility and efficiency of the CC-MST algorithm. A fuzzy query prototype demonstrates the usability of our fuzzy query mechanism.
Gao, Xuefeng
Though many building energy benchmarking programs have been developed during the past decades, they hold certain limitations. The major concern is that they may cause misleading benchmarking due to not fully considering the impacts of the multiple features of buildings on energy performance. The existing methods classify buildings according to only one of many features of buildings -- the use type, which may result in a comparison between two buildings that are tremendously different in other features and not properly comparable as a result. This research aims to tackle this challenge by proposing a new methodology based on the clustering concept and statistical analysis. The clustering concept, which reflects on machine learning algorithms, classifies buildings based on a multi-dimensional domain of building features, rather than the single dimension of use type. Buildings with the greatest similarity of features that influence energy performance are classified into the same cluster, and benchmarked according to the centroid reference of the cluster. Statistical analysis is applied to find the most influential features impacting building energy performance, as well as provide prediction models for the new design energy consumption. The proposed methodology as applicable to both existing building benchmarking and new design benchmarking was discussed in this dissertation. The former contains four steps: feature selection, clustering algorithm adaptation, results validation, and interpretation. The latter consists of three parts: data observation, inverse modeling, and forward modeling. The experimentation and validation were carried out for both perspectives. It was shown that the proposed methodology could account for the total building energy performance and was able to provide a more comprehensive approach to benchmarking. In addition, the multi-dimensional clustering concept enables energy benchmarking among different types of buildings, and inspires a new
Fuzzy Modeled K-Cluster Quality Mining of Hidden Knowledge for Decision Support
Directory of Open Access Journals (Sweden)
S. Parkash Kumar
2011-01-01
Full Text Available Problem statement: The work presented Fuzzy Modeled K-means Cluster Quality Mining of hidden knowledge for Decision Support. Based on the number of clusters, number of objects in each cluster and its cohesiveness, precision and recall values, the cluster quality metrics is measured. The fuzzy k-means is adapted approach by using heuristic method which iterates the cluster to form an efficient valid cluster. With the obtained data clusters, quality assessment is made by predictive mining using decision tree model. Validation criteria focus on the quality metrics of the institution features for cluster formation and handle efficiently the arbitrary shaped clusters. Approach: The proposed work presented a fuzzy k-means cluster algorithm in the formation of student, faculty and infrastructural clusters based on the performance, skill set and facilitation availability respectively. The knowledge hidden among the educational data set is extracted through Fuzzy k-means cluster an unsupervised learning depends on certain initiation values to define the subgroups present in the data set. Results: Based on the features of the dataset and input parameters cluster formation vary, which motivates the clarification of cluster validity. The results of quality indexed fuzzy k-means shows better cluster validation compared to that of traditional k-family algorithm. Conclusion: The experimental results of cluster validation scheme confirm the reliability of validity index showing that it performs better than other k-family clusters.
Exploring the profiles of nurses' job satisfaction in Macau: results of a cluster analysis.
Chan, Moon Fai; Leong, Sok Man; Luk, Andrew Leung; Yeung, Siu Ming; Van, Iat Kio
2010-02-01
To determine whether definable subtypes exist within a cohort of nurses with regard to factors associated with nurses' job satisfaction patterns and to compare whether these factors vary between nurses in groups with different profiles. Globally, the health care system is experiencing major changes and influence nurses' job satisfaction and may ultimately affect the quality of nursing care for patients. A descriptive survey. Data were collected using a self-reported structured questionnaire. Nurses were recruited in two hospitals in Macao. Two main outcome variables were collected: Predisposing characteristics and five components on job satisfaction outcomes. A cluster analysis yielded two clusters (n = 649). Cluster 1 consisted of 60.6% (n = 393) and Cluster 2 of 39.4% (n = 256) of the nurses. Cluster 1 nurses were younger, more educated and had less work experience and more intention to change their career than nurses in Cluster 2. Cluster 2 nurses had more work experiences, were of more senior grade and were more satisfied with their current job in terms of peer supports, autonomy and professional opportunities, scheduling and relationships with team members than nurses in Cluster 1. Findings might help by providing important information for health care managers to identify strategies/methods to target a specific group of nurses in hopes of increasing their job satisfaction levels. As a long-term investment, hospital management has to promote work environments that support job satisfaction to attract nurses and thereby improve the quality of nursing care. The results of this study might provide hospital managers with a model to design specified interventions to improve nurses' job satisfaction.
Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster
Directory of Open Access Journals (Sweden)
Tao Ran
2016-01-01
Full Text Available As Hadoop has gained popularity in big data era, it is widely used in various fields. The self-design and self-developed large-scale network traffic analysis cluster works well based on Hadoop, with off-line applications running on it to analyze the massive network traffic data. On purpose of scientifically and reasonably evaluating the performance of analysis cluster, we propose a performance evaluation system. Firstly, we set the execution times of three benchmark applications as the benchmark of the performance, and pick 40 metrics of customized statistical resource data. Then we identify the relationship between the resource data and the execution times by a statistic modeling analysis approach, which is composed of principal component analysis and multiple linear regression. After training models by historical data, we can predict the execution times by current resource data. Finally, we evaluate the performance of analysis cluster by the validated predicting of execution times. Experimental results show that the predicted execution times by trained models are within acceptable error range, and the evaluation results of performance are accurate and reliable.
Evolutionary-Hierarchical Bases of the Formation of Cluster Model of Innovation Economic Development
Directory of Open Access Journals (Sweden)
Yuliya Vladimirovna Dubrovskaya
2016-10-01
Full Text Available The functioning of a modern economic system is based on the interaction of objects of different hierarchical levels. Thus, the problem of the study of innovation processes taking into account the mutual influence of the activities of these economic actors becomes important. The paper dwells evolutionary basis for the formation of models of innovation development on the basis of micro and macroeconomic analysis. Most of the concepts recognized that despite a big number of diverse models, the coordination of the relations between economic agents is of crucial importance for the successful innovation development. According to the results of the evolutionary-hierarchical analysis, the authors reveal key phases of the development of forms of business cooperation, science and government in the domestic economy. It has become the starting point of the conception of the characteristics of the interaction in the cluster models of innovation development of the economy. Considerable expectancies on improvement of the national innovative system are connected with the development of cluster and network structures. The main objective of government authorities is the formation of mechanisms and institutions that will foster cooperation between members of the clusters. The article explains that the clusters cannot become the factors in the growth of the national economy, not being an effective tool for interaction between the actors of the regional innovative systems.
Analysis of breast cancer progression using principal component analysis and clustering
Indian Academy of Sciences (India)
G Alexe; G S Dalgin; S Ganesan; C DeLisi; G Bhanot
2007-08-01
We develop a new technique to analyse microarray data which uses a combination of principal components analysis and consensus ensemble -clustering to find robust clusters and gene markers in the data. We apply our method to a public microarray breast cancer dataset which has expression levels of genes in normal samples as well as in three pathological stages of disease; namely, atypical ductal hyperplasia or ADH, ductal carcinoma in situ or DCIS and invasive ductal carcinoma or IDC. Our method averages over clustering techniques and data perturbation to find stable, robust clusters and gene markers. We identify the clusters and their pathways with distinct subtypes of breast cancer (Luminal, Basal and Her2+). We confirm that the cancer phenotype develops early (in early hyperplasia or ADH stage) and find from our analysis that each subtype progresses from ADH to DCIS to IDC along its own specific pathway, as if each was a distinct disease.
Number of Clusters and the Quality of Hybrid Predictive Models in Analytical CRM
Directory of Open Access Journals (Sweden)
Łapczyński Mariusz
2014-08-01
Full Text Available Making more accurate marketing decisions by managers requires building effective predictive models. Typically, these models specify the probability of customer belonging to a particular category, group or segment. The analytical CRM categories refer to customers interested in starting cooperation with the company (acquisition models, customers who purchase additional products (cross- and up-sell models or customers intending to resign from the cooperation (churn models. During building predictive models researchers use analytical tools from various disciplines with an emphasis on their best performance. This article attempts to build a hybrid predictive model combining decision trees (C&RT algorithm and cluster analysis (k-means. During experiments five different cluster validity indices and eight datasets were used. The performance of models was evaluated by using popular measures such as: accuracy, precision, recall, G-mean, F-measure and lift in the first and in the second decile. The authors tried to find a connection between the number of clusters and models' quality.
nIFTy galaxy cluster simulations - II. Radiative models
Sembolini, Federico; Elahi, Pascal Jahan; Pearce, Frazer R.; Power, Chris; Knebe, Alexander; Kay, Scott T.; Cui, Weiguang; Yepes, Gustavo; Beck, Alexander M.; Borgani, Stefano; Cunnama, Daniel; Davé, Romeel; February, Sean; Huang, Shuiyao; Katz, Neal; McCarthy, Ian G.; Murante, Giuseppe; Newton, Richard D. A.; Perret, Valentin; Puchwein, Ewald; Saro, Alexandro; Schaye, Joop; Teyssier, Romain
2016-07-01
We have simulated the formation of a massive galaxy cluster (M_{200}^crit = 1.1 × 1015 h-1 M⊙) in a Λ cold dark matter universe using 10 different codes (RAMSES, 2 incarnations of AREPO and 7 of GADGET), modelling hydrodynamics with full radiative subgrid physics. These codes include smoothed-particle hydrodynamics (SPH), spanning traditional and advanced SPH schemes, adaptive mesh and moving mesh codes. Our goal is to study the consistency between simulated clusters modelled with different radiative physical implementations - such as cooling, star formation and thermal active galactic nucleus (AGN) feedback. We compare images of the cluster at z = 0, global properties such as mass, and radial profiles of various dynamical and thermodynamical quantities. We find that, with respect to non-radiative simulations, dark matter is more centrally concentrated, the extent not simply depending on the presence/absence of AGN feedback. The scatter in global quantities is substantially higher than for non-radiative runs. Intriguingly, adding radiative physics seems to have washed away the marked code-based differences present in the entropy profile seen for non-radiative simulations in Sembolini et al.: radiative physics + classic SPH can produce entropy cores, at least in the case of non cool-core clusters. Furthermore, the inclusion/absence of AGN feedback is not the dividing line -as in the case of describing the stellar content - for whether a code produces an unrealistic temperature inversion and a falling central entropy profile. However, AGN feedback does strongly affect the overall stellar distribution, limiting the effect of overcooling and reducing sensibly the stellar fraction.
Koestler, Devin C; Christensen, Brock C; Marsit, Carmen J; Kelsey, Karl T; Houseman, E Andres
2013-03-05
DNA methylation is a well-recognized epigenetic mechanism that has been the subject of a growing body of literature typically focused on the identification and study of profiles of DNA methylation and their association with human diseases and exposures. In recent years, a number of unsupervised clustering algorithms, both parametric and non-parametric, have been proposed for clustering large-scale DNA methylation data. However, most of these approaches do not incorporate known biological relationships of measured features, and in some cases, rely on unrealistic assumptions regarding the nature of DNA methylation. Here, we propose a modified version of a recursively partitioned mixture model (RPMM) that integrates information related to the proximity of CpG loci within the genome to inform correlation structures from which subsequent clustering analysis is based. Using simulations and four methylation data sets, we demonstrate that integrating biologically informative correlation structures within RPMM resulted in improved goodness-of-fit, clustering consistency, and the ability to detect biologically meaningful clusters compared to methods which ignore such correlation. Integrating biologically-informed correlation structures to enhance modeling techniques is motivated by the rapid increase in resolution of DNA methylation microarrays and the increasing understanding of the biology of this epigenetic mechanism.
Anisotropic Models for Globular Clusters, Galactic Bulges and Dark Halos
Nguyen, P H
2013-01-01
Spherical systems with a polytropic equation of state are of great interest in astrophysics. They are widely used to describe neutron stars, red giants, white dwarfs, brown dwarfs, main sequence stars, galactic halos and globular clusters of diverse sizes. In this paper we construct analytically a family of self-gravitating spherical models in the post-Newtonian approximation of general relativity. These models present interesting cusps in their density profiles which are appropriate for the modeling of galaxies and dark matter halos. The systems described here are anisotropic in the sense that their equiprobability surfaces in velocity space are non-spherical, leading to an overabundance of radial or circular orbits, depending on the parameters of the model in consideration. Among the family, we find the post-Newtonian generalization of the Plummer and Hernquist models. A close inspection of their equation of state reveals that these solutions interpolate smoothly between a polytropic sphere in the asymptoti...
Ahmad, Tariq; Desai, Nihar; Wilson, Francis; Schulte, Phillip; Dunning, Allison; Jacoby, Daniel; Allen, Larry; Fiuzat, Mona; Rogers, Joseph; Felker, G Michael; O'Connor, Christopher; Patel, Chetan B
2016-01-01
Classification of acute decompensated heart failure (ADHF) is based on subjective criteria that crudely capture disease heterogeneity. Improved phenotyping of the syndrome may help improve therapeutic strategies. To derive cluster analysis-based groupings for patients hospitalized with ADHF, and compare their prognostic performance to hemodynamic classifications derived at the bedside. We performed a cluster analysis on baseline clinical variables and PAC measurements of 172 ADHF patients from the ESCAPE trial. Employing regression techniques, we examined associations between clusters and clinically determined hemodynamic profiles (warm/cold/wet/dry). We assessed association with clinical outcomes using Cox proportional hazards models. Likelihood ratio tests were used to compare the prognostic value of cluster data to that of hemodynamic data. We identified four advanced HF clusters: 1) male Caucasians with ischemic cardiomyopathy, multiple comorbidities, lowest B-type natriuretic peptide (BNP) levels; 2) females with non-ischemic cardiomyopathy, few comorbidities, most favorable hemodynamics; 3) young African American males with non-ischemic cardiomyopathy, most adverse hemodynamics, advanced disease; and 4) older Caucasians with ischemic cardiomyopathy, concomitant renal insufficiency, highest BNP levels. There was no association between clusters and bedside-derived hemodynamic profiles (p = 0.70). For all adverse clinical outcomes, Cluster 4 had the highest risk, and Cluster 2, the lowest. Compared to Cluster 4, Clusters 1-3 had 45-70% lower risk of all-cause mortality. Clusters were significantly associated with clinical outcomes, whereas hemodynamic profiles were not. By clustering patients with similar objective variables, we identified four clinically relevant phenotypes of ADHF patients, with no discernable relationship to hemodynamic profiles, but distinct associations with adverse outcomes. Our analysis suggests that ADHF classification using simultaneous
Directory of Open Access Journals (Sweden)
Yli-Harja Olli
2009-05-01
Full Text Available Abstract Background Cluster analysis has become a standard computational method for gene function discovery as well as for more general explanatory data analysis. A number of different approaches have been proposed for that purpose, out of which different mixture models provide a principled probabilistic framework. Cluster analysis is increasingly often supplemented with multiple data sources nowadays, and these heterogeneous information sources should be made as efficient use of as possible. Results This paper presents a novel Beta-Gaussian mixture model (BGMM for clustering genes based on Gaussian distributed and beta distributed data. The proposed BGMM can be viewed as a natural extension of the beta mixture model (BMM and the Gaussian mixture model (GMM. The proposed BGMM method differs from other mixture model based methods in its integration of two different data types into a single and unified probabilistic modeling framework, which provides a more efficient use of multiple data sources than methods that analyze different data sources separately. Moreover, BGMM provides an exceedingly flexible modeling framework since many data sources can be modeled as Gaussian or beta distributed random variables, and it can also be extended to integrate data that have other parametric distributions as well, which adds even more flexibility to this model-based clustering framework. We developed three types of estimation algorithms for BGMM, the standard expectation maximization (EM algorithm, an approximated EM and a hybrid EM, and propose to tackle the model selection problem by well-known model selection criteria, for which we test the Akaike information criterion (AIC, a modified AIC (AIC3, the Bayesian information criterion (BIC, and the integrated classification likelihood-BIC (ICL-BIC. Conclusion Performance tests with simulated data show that combining two different data sources into a single mixture joint model greatly improves the clustering
Yates, Robert M.; Thomas, Peter A.; Henriques, Bruno M. B.
2017-01-01
We present an analysis of the iron abundance in the hot gas surrounding galaxy groups and clusters. To do this, we first compile and homogenize a large data set of 79 low-redshift (tilde{z} = 0.03) systems (159 individual measurements) from the literature. Our analysis accounts for differences in aperture size, solar abundance, and cosmology, and scales all measurements using customized radial profiles for the temperature (T), gas density (ρgas), and iron abundance (ZFe). We then compare this data set to groups and clusters in the L-GALAXIES galaxy evolution model. Our homogenized data set reveals a tight T-ZFe relation for clusters, with a scatter in ZFe of only 0.10 dex and a slight negative gradient. After examining potential measurement biases, we conclude that some of this negative gradient has a physical origin. Our model suggests greater accretion of hydrogen in the hottest systems, via stripping from infalling satellites, as a cause. In groups, L-GALAXIES over-estimates ZFe, indicating that metal-rich gas removal (via e.g. AGN feedback) is required. L-GALAXIES is consistent with the observed ZFe in the intracluster medium (ICM) of the hottest clusters at z = 0, and shows a similar rate of ICM enrichment as that observed from at least z ˜ 1.3 to the present day. This is achieved without needing to modify any of the galactic chemical evolution (GCE) model parameters. However, the ZFe in intermediate-T clusters could be under-estimated in our model. We caution that modifications to the GCE modelling to correct this disrupt the agreement with observations of galaxies' stellar components.
Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong
2011-09-01
The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.
Electromagnetic selection rules in the triangular α-cluster model of 12C
Stellin, G.; Fortunato, L.; Vitturi, A.
2016-08-01
After recapitulating the procedure to find the bands and the states occurring in the {{ D }}3h alpha-cluster model of 12C in which the clusters are placed at the vertexes of an equilateral triangle, we obtain the selection rules for electromagnetic transitions. While the alpha-cluster structure leads to the cancellation of E1 transitions, the approximations carried out in deriving the rotational-vibrational Hamiltonian lead to the disappearance of M1 transitions. Furthermore, although in general the lowest active modes are E2, E3, ... and M2, M3, ..., the cancellation of M2, M3 and M5 transitions between certain bands also occur as a result of the application of group theoretical techniques drawn from molecular physics. These implications can be very relevant for the spectroscopic analysis of γ-ray spectra of 12C.
Electromagnetic selection rules in the triangular alpha-cluster model of 12C
Stellin, G; Vitturi, A
2015-01-01
After recapitulating the procedure to find the bands and the states occurring in the $\\mathcal{D}_{3h}$ alpha-cluster model of $^{12}$C in which the clusters are placed at the vertexes of an equilateral triangle, we obtain the selection rules for electromagnetic transitions. While the alpha cluster structure leads to the cancellation of E1 transitions, the approximations carried out in deriving the roto-vibrational hamiltonian lead to the disappearance of M1 transitions. Furthermore, although in general the lowest active modes are E2, E3, $\\cdots$ and M2, M3, $\\cdots$, the cancellation of M2, M3 and M5 transitions between certain bands also occurs, as a result of the application of group theoretical techniques drawn from molecular physics. These implications can be very relevant for the spectroscopic analysis of $\\gamma$-ray spectra of $^{12}$C.
Kauhl, Boris; Heil, Jeanne; Hoebe, Christian J. P. A.; Schweikart, Jürgen; Krafft, Thomas; Dukers-Muijrers, Nicole H. T. M.
2017-01-01
Background Despite high vaccination coverage, pertussis incidence in the Netherlands is amongst the highest in Europe with a shifting tendency towards adults and elderly. Early detection of outbreaks and preventive actions are necessary to prevent severe complications in infants. Efficient pertussis control requires additional background knowledge about the determinants of testing and possible determinants of the current pertussis incidence. Therefore, the aim of our study is to examine the possibility of locating possible pertussis outbreaks using space-time cluster detection and to examine the determinants of pertussis testing and incidence using geographically weighted regression models. Methods We analysed laboratory registry data including all geocoded pertussis tests in the southern area of the Netherlands between 2007 and 2013. Socio-demographic and infrastructure-related population data were matched to the geo-coded laboratory data. The spatial scan statistic was applied to detect spatial and space-time clusters of testing, incidence and test-positivity. Geographically weighted Poisson regression (GWPR) models were then constructed to model the associations between the age-specific rates of testing and incidence and possible population-based determinants. Results Space-time clusters for pertussis incidence overlapped with space-time clusters for testing, reflecting a strong relationship between testing and incidence, irrespective of the examined age group. Testing for pertussis itself was overall associated with lower socio-economic status, multi-person-households, proximity to primary school and availability of healthcare. The current incidence in contradiction is mainly determined by testing and is not associated with a lower socioeconomic status. Discussion Testing for pertussis follows to an extent the general healthcare seeking behaviour for common respiratory infections, whereas the current pertussis incidence is largely the result of testing. More
Directory of Open Access Journals (Sweden)
K.Vijayarekha
2012-12-01
Full Text Available Linear Discriminant Analysis (LDA is one technique for transforming raw data into a new feature space in which classification can be carried out more robustly. It is useful where the within-class frequencies are unequal. This method maximizes the ratio of between-class variance to the within-class variance in any particular data set and the maximal separability is guaranteed. LDA clustering models are used to classify object into different category. This study makes use of LDA for clustering the features obtained for the citrus fruit images taken in five different domains. Sub-windows of size 40x40 are cropped from the citrus fruit images having defects such as pitting, splitting and stem end rot. Features are extracted in four domains such as statistical features, fourier transform based features, discrete wavelet transform based features and stationary wavelet transform based features. The results of clustering and classification using LDA and ANN classifiers are reported
Contour Cluster Shape Analysis for Building Damage Detection from Post-earthquake Airborne LiDAR
Directory of Open Access Journals (Sweden)
HE Meizhang
2015-04-01
Full Text Available Detection of the damaged building is the obligatory step prior to evaluate earthquake casualty and economic losses. It's very difficult to detect damaged buildings accurately based on the assumption that intact roofs appear in laser data as large planar segments whereas collapsed roofs are characterized by many small segments. This paper presents a contour cluster shape similarity analysis algorithm for reliable building damage detection from the post-earthquake airborne LiDAR point cloud. First we evaluate the entropies of shape similarities between all the combinations of two contour lines within a building cluster, which quantitatively describe the shape diversity. Then the maximum entropy model is employed to divide all the clusters into intact and damaged classes. The tests on the LiDAR data at El Mayor-Cucapah earthquake rupture prove the accuracy and reliability of the proposed method.
Covariance analysis of differential drag-based satellite cluster flight
Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini
2016-06-01
One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.
Clustered Numerical Data Analysis Using Markov Lie Monoid Based Networks
Johnson, Joseph
2016-03-01
We have designed and build an optimal numerical standardization algorithm that links numerical values with their associated units, error level, and defining metadata thus supporting automated data exchange and new levels of artificial intelligence (AI). The software manages all dimensional and error analysis and computational tracing. Tables of entities verses properties of these generalized numbers (called ``metanumbers'') support a transformation of each table into a network among the entities and another network among their properties where the network connection matrix is based upon a proximity metric between the two items. We previously proved that every network is isomorphic to the Lie algebra that generates continuous Markov transformations. We have also shown that the eigenvectors of these Markov matrices provide an agnostic clustering of the underlying patterns. We will present this methodology and show how our new work on conversion of scientific numerical data through this process can reveal underlying information clusters ordered by the eigenvalues. We will also show how the linking of clusters from different tables can be used to form a ``supernet'' of all numerical information supporting new initiatives in AI.
Cyber Profiling Using Log Analysis And K-Means Clustering
Directory of Open Access Journals (Sweden)
Muhammad Zulfadhilah
2016-07-01
Full Text Available The Activities of Internet users are increasing from year to year and has had an impact on the behavior of the users themselves. Assessment of user behavior is often only based on interaction across the Internet without knowing any others activities. The log activity can be used as another way to study the behavior of the user. The Log Internet activity is one of the types of big data so that the use of data mining with K-Means technique can be used as a solution for the analysis of user behavior. This study has been carried out the process of clustering using K-Means algorithm is divided into three clusters, namely high, medium, and low. The results of the higher education institution show that each of these clusters produces websites that are frequented by the sequence: website search engine, social media, news, and information. This study also showed that the cyber profiling had been done strongly influenced by environmental factors and daily activities.
Dynamical analysis of galaxy cluster merger Abell 2146
White, J A; King, L J; Lee, B E; Russell, H R; Baum, S A; Clowe, D I; Coleman, J E; Donahue, M; Edge, A C; Fabian, A C; Johnstone, R M; McNamara, B R; ODea, C P; Sanders, J S
2015-01-01
We present a dynamical analysis of the merging galaxy cluster system Abell 2146 using spectroscopy obtained with the Gemini Multi-Object Spectrograph on the Gemini North telescope. As revealed by the Chandra X-ray Observatory, the system is undergoing a major merger and has a gas structure indicative of a recent first core passage. The system presents two large shock fronts, making it unique amongst these rare systems. The hot gas structure indicates that the merger axis must be close to the plane of the sky and that the two merging clusters are relatively close in mass, from the observation of two shock fronts. Using 63 spectroscopically determined cluster members, we apply various statistical tests to establish the presence of two distinct massive structures. With the caveat that the system has recently undergone a major merger, the virial mass estimate is M_vir = 8.5 +4.3 -4.7 x 10 ^14 M_sol for the whole system, consistent with the mass determination in a previous study using the Sunyaev-Zeldovich signal....
Modeling, clustering, and segmenting video with mixtures of dynamic textures.
Chan, Antoni B; Vasconcelos, Nuno
2008-05-01
A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectationmaximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time-series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (e.g. fire, steam, water, vehicle and pedestrian traffic, etc.). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (e.g. optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes.
A Global Model for Circumgalactic and Cluster-core Precipitation
Voit, G. Mark; Meece, Greg; Li, Yuan; O'Shea, Brian W.; Bryan, Greg L.; Donahue, Megan
2017-08-01
We provide an analytic framework for interpreting observations of multiphase circumgalactic gas that is heavily informed by recent numerical simulations of thermal instability and precipitation in cool-core galaxy clusters. We start by considering the local conditions required for the formation of multiphase gas via two different modes: (1) uplift of ambient gas by galactic outflows, and (2) condensation in a stratified stationary medium in which thermal balance is explicitly maintained. Analytic exploration of these two modes provides insights into the relationships between the local ratio of the cooling and freefall timescales (i.e., {t}{cool}/{t}{ff}), the large-scale gradient of specific entropy, and the development of precipitation and multiphase media in circumgalactic gas. We then use these analytic findings to interpret recent simulations of circumgalactic gas in which global thermal balance is maintained. We show that long-lasting configurations of gas with 5≲ \\min ({t}{cool}/{t}{ff})≲ 20 and radial entropy profiles similar to observations of cool cores in galaxy clusters are a natural outcome of precipitation-regulated feedback. We conclude with some observational predictions that follow from these models. This work focuses primarily on precipitation and AGN feedback in galaxy-cluster cores, because that is where the observations of multiphase gas around galaxies are most complete. However, many of the physical principles that govern condensation in those environments apply to circumgalactic gas around galaxies of all masses.
A spectrophotometric model applied to cluster galaxies: the WINGS dataset
Fritz, J; Bettoni, D; Cava, A; Couch, W J; D'Onofrio, M; Dressler, A; Fasano, G; Kjaergaard, P; Moles, M; Varela, J
2007-01-01
[Abridged] The WIde-field Nearby Galaxy-cluster Survey (WINGS) is a project aiming at the study of the galaxy populations in clusters in the local universe (0.04
Cluster model of s- and p-shell hypernuclei
Indian Academy of Sciences (India)
Mohammad Shoeb; Alemiye Mamo; Amanuel Fessahatsion
2007-06-01
The binding energy ( ) of the s- and p-shell hypernuclei are calculated variationally in the cluster model and multidimensional integrations are performed using Monte Carlo. A variety of phenomenological -core potentials consistent with the -core energies and a wide range of simulated s-state potentials are taken as input. The of $_{ }^{6}$He is explained and $_{ }^{5}$He and $_{ }^{5}$H are predicted to be particle stable in the -core model. The results for s-shell hypernuclei are in excellent agreement with those of non-VMC calculations. The $_{}^{10}$Be in model is overbound for combinations of and potentials. A phenomenological dispersive three-body force, , consistent with the of $_{}^{9}$Be in the model underbinds $_{ }^{10}$Be. The incremental values for the s- and p-shell cannot be reconciled, consistent with the finding of earlier analyses.
Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data
Getz, G; Kela, I; Domany, E; Notterman, D A; Getz, Gad; Gal, Hilah; Kela, Itai; Domany, Eytan; Notterman, Dan A.
2003-01-01
We present and review Coupled Two Way Clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.
Adrian Ioana; Tiberiu Socaciu
2013-01-01
The article presents specific aspects of management and models for economic analysis. Thus, we present the main types of economic analysis: statistical analysis, dynamic analysis, static analysis, mathematical analysis, psychological analysis. Also we present the main object of the analysis: the technological activity analysis of a company, the analysis of the production costs, the economic activity analysis of a company, the analysis of equipment, the analysis of labor productivity, the anal...
Joint Analysis of Galaxy-Galaxy Lensing and Galaxy Clustering: Methodology and Forecasts for DES
Park, Y; Dodelson, S; Jain, B; Amara, A; Becker, M R; Bridle, S L; Clampitt, J; Crocce, M; Fosalba, P; Gaztanaga, E; Honscheid, K; Rozo, E; Sobreira, F; Sánchez, C; Wechsler, R H; Abbott, T; Abdalla, F B; Allam, S; Benoit-Lévy, A; Bertin, E; Brooks, D; Buckley-Geer, E; Burke, D L; Rosell, A Carnero; Kind, M Carrasco; Carretero, J; Castander, F J; da Costa, L N; DePoy, D L; Desai, S; Dietrich, J P; Gerdes, D W; Gruen, D; Gruendl, R A; Gutierrez, G; James, D J; Kent, S; Kuehn, K; Kuropatkin, N; Lima, M; Maia, M A G; Marshall, J L; Melchior, P; Miller, C J; Sanchez, E; Scarpine, V; Schubnell, M; Sevilla-Noarbe, I; Soares-Santos, M; Suchyta, E; Swanson, M E C; Tarle, G; Thaler, J; Vikram, V; Walker, A R; Weller, J; Zuntz, J
2015-01-01
The joint analysis of galaxy-galaxy lensing and galaxy clustering is a promising method for inferring the growth function of large scale structure. This analysis will be carried out on data from the Dark Energy Survey (DES), with its measurements of both the distribution of galaxies and the tangential shears of background galaxies induced by these foreground lenses. We develop a practical approach to modeling the assumptions and systematic effects affecting small scale lensing, which provides halo masses, and large scale galaxy clustering. Introducing parameters that characterize the halo occupation distribution (HOD), photometric redshift uncertainties, and shear measurement errors, we study how external priors on different subsets of these parameters affect our growth constraints. Degeneracies within the HOD model, as well as between the HOD and the growth function, are identified as the dominant source of complication, with other systematic effects sub-dominant. The impact of HOD parameters and their degen...
A cluster analysis on road traffic accidents using genetic algorithms
Saharan, Sabariah; Baragona, Roberto
2017-04-01
The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.
Clustering Analysis on E-commerce Transaction Based on K-means Clustering
Directory of Open Access Journals (Sweden)
Xuan HUANG
2014-02-01
Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.
Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor
2017-05-01
Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Alves, S. G.; Martins, M. L.
2010-09-01
Aggregation of animal cells in culture comprises a series of motility, collision and adhesion processes of basic relevance for tissue engineering, bioseparations, oncology research and in vitro drug testing. In the present paper, a cluster-cluster aggregation model with stochastic particle replication and chemotactically driven motility is investigated as a model for the growth of animal cells in culture. The focus is on the scaling laws governing the aggregation kinetics. Our simulations reveal that in the absence of chemotaxy the mean cluster size and the total number of clusters scale in time as stretched exponentials dependent on the particle replication rate. Also, the dynamical cluster size distribution functions are represented by a scaling relation in which the scaling function involves a stretched exponential of the time. The introduction of chemoattraction among the particles leads to distribution functions decaying as power laws with exponents that decrease in time. The fractal dimensions and size distributions of the simulated clusters are qualitatively discussed in terms of those determined experimentally for several normal and tumoral cell lines growing in culture. It is shown that particle replication and chemotaxy account for the simplest cluster size distributions of cellular aggregates observed in culture.
MODEL OF CLUSTER DEVELOPMENT IN THE MANAGEMENT OF WINERIES ENTERPRISES IN ATU GAGAUZIA
Directory of Open Access Journals (Sweden)
Nadejda IANIOGLO
2015-12-01
Full Text Available Public policy support for cluster development of the industrial sector requires investment by enterprises in research and innovation. These processes are only possible by sharing existing capacities, effective knowledge sharing and technology transfer between companies within the same or related industries. The relevance of this article is dictated by the need to find new forms of co-operation of producers in frame of cluster formations. The purpose of this article is to develop a model of clustermanagement for development of wineries in the region. When writing this article used the methods of empirical research: a survey, analysis, synthesis, processing documentation. Results. In order to increase the competitiveness of wine enterprises, the author proposed the development of the wine industry of ATU Gagauzia within the framework of the cluster policy of the Republic of Moldova. The author proposed organizational structure of the wine cluster of ATU Gagauzia, described benefits which get when entering in the cluster for its participants and for the overall region.
Dynamical analysis of NGC 110: cluster of fainter stars or data fluctuation?
Joshi, Gireesh C
2016-01-01
The stellar enhancement of the cluster NGC 110 is investigated in various optical and infrared (IR) bands. The radial density profile of the IR region does not show a stellar enhancement in the central region of the cluster. This stellar deficiency may be occurring by undetected fainter stars due to the contamination effect of massive stars. Since, our analysis is not indicating the stellar enhancement below 16.5 mag of I band, therefore the cluster is assumed to be a group of fainter stars. The proposed magnitude scatter factor would be an excellent tool to understand the characteristic of colour-scattering of stars. The most probable members do not coincide with the model isochronic fitting in the optical bands due to poor data quality of P P MXL catalogue. The different values of the mean proper motions are found for the fainter stars of the cluster and field regions, whereas similar values are obtained for radial zones of the cluster. The symmetrical distribution of fainter stars of the core are found aro...
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL
2006-01-01
The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the Kmeans or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a given set of data through the embedding of the highdimensional data items on a two-dimensional grid for efficient clustering result retrieval and visualization. In this paper, we propose a bio-inspired clustering model, the Multiple Species Flocking clustering model (MSF), and present a distributed multi-agent MSF approach for document clustering.
New judging model of fuzzy cluster optimal dividing based on rough sets theory
Institute of Scientific and Technical Information of China (English)
Wang Yun; Liu Qinghong; Mu Yong; Shi Kaiquan
2007-01-01
To investigate the judging problem of optimal dividing matrix among several fuzzy dividing matrices in fuzzy dividing space, correspondingly, which is determined by the various choices of cluster samples in the totality sample space, two algorithms are proposed on the basis of the data analysis method in rough sets theory: information system discrete algorithm (algorithm 1) and samples representatives judging algorithm (algorithm 2).On the principle of the farthest distance, algorithm 1 transforms continuous data into discrete form which could be transacted by rough sets theory.Taking the approximate precision as a criterion, algorithm 2 chooses the sample space with a good representative.Hence, the clustering sample set in inducing and computing optimal dividing matrix can be achieved.Several theorems are proposed to provide strict theoretic foundations for the execution of the algorithm model.An applied example based on the new algorithm model is given, whose result verifies the feasibility of this new algorithm model.
Artim-Esen, Bahar; Çene, Erhan; Şahinkaya, Yasemin; Ertan, Semra; Pehlivan, Özlem; Kamali, Sevil; Gül, Ahmet; Öcal, Lale; Aral, Orhan; Inanç, Murat
2014-07-01
Associations between autoantibodies and clinical features have been described in systemic lupus erythematosus (SLE). Herein, we aimed to define autoantibody clusters and their clinical correlations in a large cohort of patients with SLE. We analyzed 852 patients with SLE who attended our clinic. Seven autoantibodies were selected for cluster analysis: anti-DNA, anti-Sm, anti-RNP, anticardiolipin (aCL) immunoglobulin (Ig)G or IgM, lupus anticoagulant (LAC), anti-Ro, and anti-La. Two-step clustering and Kaplan-Meier survival analyses were used. Five clusters were identified. A cluster consisted of patients with only anti-dsDNA antibodies, a cluster of anti-Sm and anti-RNP, a cluster of aCL IgG/M and LAC, and a cluster of anti-Ro and anti-La antibodies. Analysis revealed 1 more cluster that consisted of patients who did not belong to any of the clusters formed by antibodies chosen for cluster analysis. Sm/RNP cluster had significantly higher incidence of pulmonary hypertension and Raynaud phenomenon. DsDNA cluster had the highest incidence of renal involvement. In the aCL/LAC cluster, there were significantly more patients with neuropsychiatric involvement, antiphospholipid syndrome, autoimmune hemolytic anemia, and thrombocytopenia. According to the Systemic Lupus International Collaborating Clinics damage index, the highest frequency of damage was in the aCL/LAC cluster. Comparison of 10 and 20 years survival showed reduced survival in the aCL/LAC cluster. This study supports the existence of autoantibody clusters with distinct clinical features in SLE and shows that forming clinical subsets according to autoantibody clusters may be useful in predicting the outcome of the disease. Autoantibody clusters in SLE may exhibit differences according to the clinical setting or population.
Structure and substructure analysis of DAFT/FADA galaxy clusters in the [0.4-0.9] redshift range
Guennou, L; Durret, F; Neto, G B Lima; Ulmer, M P; Clowe, D; LeBrun, V; Martinet, N; Allam, S; Annis, J; Basa, S; Benoist, C; Biviano, A; Cappi, A; Cypriano, E S; Gavazzi, R; Halliday, C; Ilbert, O; Jullo, E; Just, D; Limousin, M; Márquez, I; Mazure, A; Murphy, K J; Plana, H; Rostagni, F; Russeil, D; Schirmer, M; Slezak, E; Tucker, D; Zaritsky, D; Ziegler, B
2013-01-01
We analyse the structures of all the clusters in the DAFT/FADA survey for which XMM-Newton and/or a sufficient number of galaxy redshifts in the cluster range is available, with the aim of detecting substructures and evidence for merging events. These properties are discussed in the framework of standard cold dark matter cosmology.XMM-Newton data were available for 32 clusters, for which we derive the X-ray luminosity and a global X-ray temperature for 25 of them. For 23 clusters we were able to fit the X-ray emissivity with a beta-model and subtract it to detect substructures in the X-ray gas. A dynamical analysis based on the SG method was applied to the clusters having at least 15 spectroscopic galaxy redshifts in the cluster range: 18 X-ray clusters and 11 clusters with no X-ray data. Only major substructures will be detected. Ten substructures were detected both in X-rays and by the SG method. Most of the substructures detected both in X-rays and with the SG method are probably at their first cluster per...
Phenotypes Determined by Cluster Analysis in Moderate to Severe Bronchial Asthma.
Youroukova, Vania M; Dimitrova, Denitsa G; Valerieva, Anna D; Lesichkova, Spaska S; Velikova, Tsvetelina V; Ivanova-Todorova, Ekaterina I; Tumangelova-Yuzeir, Kalina D
2017-06-01
Bronchial asthma is a heterogeneous disease that includes various subtypes. They may share similar clinical characteristics, but probably have different pathological mechanisms. To identify phenotypes using cluster analysis in moderate to severe bronchial asthma and to compare differences in clinical, physiological, immunological and inflammatory data between the clusters. Forty adult patients with moderate to severe bronchial asthma out of exacerbation were included. All underwent clinical assessment, anthropometric measurements, skin prick testing, standard spirometry and measurement fraction of exhaled nitric oxide. Blood eosinophilic count, serum total IgE and periostin levels were determined. Two-step cluster approach, hierarchical clustering method and k-mean analysis were used for identification of the clusters. We have identified four clusters. Cluster 1 (n=14) - late-onset, non-atopic asthma with impaired lung function, Cluster 2 (n=13) - late-onset, atopic asthma, Cluster 3 (n=6) - late-onset, aspirin sensitivity, eosinophilic asthma, and Cluster 4 (n=7) - early-onset, atopic asthma. Our study is the first in Bulgaria in which cluster analysis is applied to asthmatic patients. We identified four clusters. The variables with greatest force for differentiation in our study were: age of asthma onset, duration of diseases, atopy, smoking, blood eosinophils, nonsteroidal anti-inflammatory drugs hypersensitivity, baseline FEV1/FVC and symptoms severity. Our results support the concept of heterogeneity of bronchial asthma and demonstrate that cluster analysis can be an useful tool for phenotyping of disease and personalized approach to the treatment of patients.
A Variational Level Set Model Combined with FCMS for Image Clustering Segmentation
Directory of Open Access Journals (Sweden)
Liming Tang
2014-01-01
Full Text Available The fuzzy C means clustering algorithm with spatial constraint (FCMS is effective for image segmentation. However, it lacks essential smoothing constraints to the cluster boundaries and enough robustness to the noise. Samson et al. proposed a variational level set model for image clustering segmentation, which can get the smooth cluster boundaries and closed cluster regions due to the use of level set scheme. However it is very sensitive to the noise since it is actually a hard C means clustering model. In this paper, based on Samson’s work, we propose a new variational level set model combined with FCMS for image clustering segmentation. Compared with FCMS clustering, the proposed model can get smooth cluster boundaries and closed cluster regions due to the use of level set scheme. In addition, a block-based energy is incorporated into the energy functional, which enables the proposed model to be more robust to the noise than FCMS clustering and Samson’s model. Some experiments on the synthetic and real images are performed to assess the performance of the proposed model. Compared with some classical image segmentation models, the proposed model has a better performance for the images contaminated by different noise levels.
IPC two-color analysis of x ray galaxy clusters
White, Raymond E., III
1990-01-01
The mass distributions were determined of several clusters of galaxies by using X ray surface brightness data from the Einstein Observatory Imaging Proportional Counter (IPC). Determining cluster mass distributions is important for constraining the nature of the dark matter which dominates the mass of galaxies, galaxy clusters, and the Universe. Galaxy clusters are permeated with hot gas in hydrostatic equilibrium with the gravitational potentials of the clusters. Cluster mass distributions can be determined from x ray observations of cluster gas by using the equation of hydrostatic equilibrium and knowledge of the density and temperature structure of the gas. The x ray surface brightness at some distance from the cluster is the result of the volume x ray emissivity being integrated along the line of sight in the cluster.
Kassomenos, P.; Vardoulakis, S.; Borge, R.; Lumbreras, J.; Papaloukas, C.; Karakitsios, S.
2010-10-01
In this study, we used and compared three different statistical clustering methods: an hierarchical, a non-hierarchical (K-means) and an artificial neural network technique (self-organizing maps (SOM)). These classification methods were applied to a 4-year dataset of 5 days kinematic back trajectories of air masses arriving in Athens, Greece at 12.00 UTC, in three different heights, above the ground. The atmospheric back trajectories were simulated with the HYSPLIT Vesion 4.7 model of National Oceanic and Atmospheric Administration (NOAA). The meteorological data used for the computation of trajectories were obtained from NOAA reanalysis database. A comparison of the three statistical clustering methods through statistical indices was attempted. It was found that all three statistical methods seem to depend to the arrival height of the trajectories, but the degree of dependence differs substantially. Hierarchical clustering showed the highest level of dependence for fast-moving trajectories to the arrival height, followed by SOM. K-means was found to be the least depended clustering technique on the arrival height. The air quality management applications of these results in relation to PM10 concentrations recorded in Athens, Greece, were also discussed. Differences of PM10 concentrations, during certain clusters, were found statistically different (at 95% confidence level) indicating that these clusters appear to be associated with long-range transportation of particulates. This study can improve the interpretation of modelled atmospheric trajectories, leading to a more reliable analysis of synoptic weather circulation patterns and their impacts on urban air quality.