Intuitionistic fuzzy hierarchical clustering algorithms
Institute of Scientific and Technical Information of China (English)
Xu Zeshui
2009-01-01
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a mem-bership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clus-tering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
Hesitant fuzzy agglomerative hierarchical clustering algorithms
Zhang, Xiaolu; Xu, Zeshui
2015-02-01
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.
Performance Analysis of Hierarchical Clustering Algorithm
Directory of Open Access Journals (Sweden)
K.Ranjini
2011-07-01
Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.
A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis
Directory of Open Access Journals (Sweden)
Shaoning Li
2017-01-01
Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Neutrosophic Hierarchical Clustering Algoritms
Directory of Open Access Journals (Sweden)
Rıdvan Şahin
2014-03-01
Full Text Available Interval neutrosophic set (INS is a generalization of interval valued intuitionistic fuzzy set (IVIFS, whose the membership and non-membership values of elements consist of fuzzy range, while single valued neutrosophic set (SVNS is regarded as extension of intuitionistic fuzzy set (IFS. In this paper, we extend the hierarchical clustering techniques proposed for IFSs and IVIFSs to SVNSs and INSs respectively. Based on the traditional hierarchical clustering procedure, the single valued neutrosophic aggregation operator, and the basic distance measures between SVNSs, we define a single valued neutrosophic hierarchical clustering algorithm for clustering SVNSs. Then we extend the algorithm to classify an interval neutrosophic data. Finally, we present some numerical examples in order to show the effectiveness and availability of the developed clustering algorithms.
Clustering dynamic textures with the hierarchical em algorithm for modeling video.
Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B
2013-07-01
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing
Directory of Open Access Journals (Sweden)
J Anuradha
2014-05-01
Full Text Available Attention Deficit Hyperactive Disorder (ADHD is a disruptive neurobehavioral disorder characterized by abnormal behavioral patterns in attention, perusing activity, acting impulsively and combined types. It is predominant among school going children and it is tricky to differentiate between an active and an ADHD child. Misdiagnosis and undiagnosed cases are very common. Behavior patterns are identified by the mentors in the academic environment who lack skills in screening those kids. Hence an unsupervised learning algorithm can cluster the behavioral patterns of children at school for diagnosis of ADHD. In this paper, we propose a hierarchical clustering algorithm to partition the dataset based on attribute dependency (HCAD. HCAD forms clusters of data based on the high dependent attributes and their equivalence relation. It is capable of handling large volumes of data with reasonably faster clustering than most of the existing algorithms. It can work on both labeled and unlabelled data sets. Experimental results reveal that this algorithm has higher accuracy in comparison to other algorithms. HCAD achieves 97% of cluster purity in diagnosing ADHD. Empirical analysis of application of HCAD on different data sets from UCI repository is provided.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Directory of Open Access Journals (Sweden)
A. Meenakshi
2016-08-01
Full Text Available Resource allocation is the task of convenient resources to different uses. In the context of an resources, entire economy, can be assigned by different means, such as markets or central planning. Cloud computing has become a new age technology that has got huge potentials in enterprises and markets. Clouds can make it possible to access applications and associated data from anywhere. The fundamental motive of the resource allocation is to allot the available resource in the most effective manner. In the initial phase, a representative resource usage distribution for a group of nodes with identical resource usage patterns is evaluated as resource bundle which can be easily employed to locate a group of nodes fulfilling a standard criterion. In the document, an innovative clustering-based resource aggregation viz. the Improved Hierarchal Agglomerative Clustering Algorithm (IHAC is elegantly launched to realize the compact illustration of a set of identically behaving nodes for scalability. In the subsequent phase concerned with energetic resource allocation procedure, the hybrid optimization technique is brilliantly brought in. The novel technique is devised for scheduling functions to cloud resources which duly consider both financial and evaluation expenses. The efficiency of the novel Resource allocation system is assessed by means of several parameters such the reliability, reusability and certain other metrics. The optimal path choice is the consequence of the hybrid optimization approach. The new-fangled technique allocates the available resource based on the optimal path.
Energy Efficient Backoff Hierarchical Clustering Algorithms for Multi-Hop Wireless Sensor Networks
Institute of Scientific and Technical Information of China (English)
Jun Wang; Yong-Tao Cao; Jun-Yuan Xie; Shi-Fu Chen
2011-01-01
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
Chang, Seongmin; Baek, Sungmin; Kim, Ki-Ook; Cho, Maenghyo
2015-06-01
A system identification method has been proposed to validate finite element models of complex structures using measured modal data. Finite element method is used for the system identification as well as the structural analysis. In perturbation methods, the perturbed system is expressed as a combination of the baseline structure and the related perturbations. The changes in dynamic responses are applied to determine the structural modifications so that the equilibrium may be satisfied in the perturbed system. In practical applications, the dynamic measurements are carried out on a limited number of accessible nodes and associated degrees of freedom. The equilibrium equation is, in principle, expressed in terms of the measured (master, primary) and unmeasured (slave, secondary) degrees of freedom. Only the specified degrees of freedom are included in the equation formulation for identification and the unspecified degrees of freedom are eliminated through the iterative improved reduction scheme. A large number of system parameters are included as the unknown variables in the system identification of large-scaled structures. The identification problem with large number of system parameters requires a large amount of computation time and resources. In the present study, a hierarchical clustering algorithm is applied to reduce the number of system parameters effectively. Numerical examples demonstrate that the proposed method greatly improves the accuracy and efficiency in the inverse problem of identification.
Directory of Open Access Journals (Sweden)
M. Khoobiyan
2017-04-01
Full Text Available Manufacturing flexibility is a multidimensional concept and manufacturing companies act differently in using these dimensions. The purpose of this study is to investigate taxonomy and identify dominant groups of manufacturing flexibility. Dimensions of manufacturing flexibility are extracted by content analysis of literature and expert judgements. Manufacturing flexibility was measured by using a questionnaire developed to survey managers of manufacturing companies. The sample size was set at 379. To identify dominant groups of flexibility based on dimensions of flexibility determined, Hierarchical Cluster Analysis (HCA, Imperialist Competitive Algorithms (ICAs and Support Vector Machines (SVMs were used by cluster validity indices. The best algorithm for clustering was SVMs with three clusters, designated as leading delivery-based flexibility, frugal flexibility and sufficient plan-based flexibility.
Directory of Open Access Journals (Sweden)
Ashim Kumar Ghosh
2011-12-01
Full Text Available Wireless sensor nodes are use most embedded computing application. Multihop cluster hierarchy has been presented for large wireless sensor networks (WSNs that can provide scalable routing, data aggregation, and querying. The energy consumption rate for sensors in a WSN varies greatly based on the protocols the sensors use for communications. In this paper we present a cluster based routing algorithm. One of our main goals is to design the energy efficient routing protocol. Here we try to solve the usual problems of WSNs. We know the efficiency of WSNs depend upon the distance between node to base station and the amount of data to be transferred and the performance of clustering is greatly influenced by the selection of cluster-heads, which are in charge of creating clusters and controlling member nodes. This algorithm makes the best use of node with low number of cluster head know as super node. Here we divided the full region in four equal zones and the centre area of the region is used to select for super node. Each zone is considered separately and the zone may be or not divided further that’s depending upon the density of nodes in that zone and capability of the super node. This algorithm forms multilayer communication. The no of layer depends on the network current load and statistics. Our algorithm is easily extended to generate a hierarchy of cluster heads to obtain better network management and energy efficiency.
Genetic Algorithm for Hierarchical Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Sajid Hussain
2007-09-01
Full Text Available Large scale wireless sensor networks (WSNs can be used for various pervasive and ubiquitous applications such as security, health-care, industry automation, agriculture, environment and habitat monitoring. As hierarchical clusters can reduce the energy consumption requirements for WSNs, we investigate intelligent techniques for cluster formation and management. A genetic algorithm (GA is used to create energy efficient clusters for data dissemination in wireless sensor networks. The simulation results show that the proposed intelligent hierarchical clustering technique can extend the network lifetime for different network deployment environments.
Hierarchical clustering for graph visualization
Clémençon, Stéphan; Rossi, Fabrice; Tran, Viet Chi
2012-01-01
This paper describes a graph visualization methodology based on hierarchical maximal modularity clustering, with interactive and significant coarsening and refining possibilities. An application of this method to HIV epidemic analysis in Cuba is outlined.
Convex Clustering: An Attractive Alternative to Hierarchical Clustering
Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth
2015-01-01
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340
Hierarchical Formation of Galactic Clusters
Elmegreen, B G
2006-01-01
Young stellar groupings and clusters have hierarchical patterns ranging from flocculent spiral arms and star complexes on the largest scale to OB associations, OB subgroups, small loose groups, clusters and cluster subclumps on the smallest scales. There is no obvious transition in morphology at the cluster boundary, suggesting that clusters are only the inner parts of the hierarchy where stars have had enough time to mix. The power-law cluster mass function follows from this hierarchical structure: n(M_cl) M_cl^-b for b~2. This value of b is independently required by the observation that the summed IMFs from many clusters in a galaxy equals approximately the IMF of each cluster.
Robust Pseudo-Hierarchical Support Vector Clustering
DEFF Research Database (Denmark)
Hansen, Michael Sass; Sjöstrand, Karl; Olafsdóttir, Hildur
2007-01-01
Support vector clustering (SVC) has proven an efficient algorithm for clustering of noisy and high-dimensional data sets, with applications within many fields of research. An inherent problem, however, has been setting the parameters of the SVC algorithm. Using the recent emergence of a method...... for calculating the entire regularization path of the support vector domain description, we propose a fast method for robust pseudo-hierarchical support vector clustering (HSVC). The method is demonstrated to work well on generated data, as well as for detecting ischemic segments from multidimensional myocardial...
A New Metrics for Hierarchical Clustering
Institute of Scientific and Technical Information of China (English)
YANGGuangwen; SHIShuming; WANGDingxing
2003-01-01
Hierarchical clustering is a popular method of performing unsupervised learning. Some metric must be used to determine the similarity between pairs of clusters in hierarchical clustering. Traditional similarity metrics either can deal with simple shapes (i.e. spherical shapes) only or are very sensitive to outliers (the chaining effect). The main contribution of this paper is to propose some potential-based similarity metrics (APES and AMAPES) between clusters in hierarchical clustering, inspired by the concepts of the electric potential and the gravitational potential in electromagnetics and astronomy. The main features of these metrics are: the first, they have strong antijamming capability; the second, they are capable of finding clusters of different shapes such as spherical, spiral, chain, circle, sigmoid, U shape or other complex irregular shapes; the third, existing algorithms and research fruits for classical metrics can be adopted to deal with these new potential-based metrics with no or little modification. Experiments showed that the new metrics are more superior to traditional ones. Different potential functions are compared, and the sensitivity to parameters is also analyzed in this paper.
Hierarchical Clustering and Active Galaxies
Hatziminaoglou, E; Manrique, A
2000-01-01
The growth of Super Massive Black Holes and the parallel development of activity in galactic nuclei are implemented in an analytic code of hierarchical clustering. The evolution of the luminosity function of quasars and AGN will be computed with special attention paid to the connection between quasars and Seyfert galaxies. One of the major interests of the model is the parallel study of quasar formation and evolution and the History of Star Formation.
Fast, Linear Time Hierarchical Clustering using the Baire Metric
Contreras, Pedro
2011-01-01
The Baire metric induces an ultrametric on a dataset and is of linear computational complexity, contrasted with the standard quadratic time agglomerative hierarchical clustering algorithm. In this work we evaluate empirically this new approach to hierarchical clustering. We compare hierarchical clustering based on the Baire metric with (i) agglomerative hierarchical clustering, in terms of algorithm properties; (ii) generalized ultrametrics, in terms of definition; and (iii) fast clustering through k-means partititioning, in terms of quality of results. For the latter, we carry out an in depth astronomical study. We apply the Baire distance to spectrometric and photometric redshifts from the Sloan Digital Sky Survey using, in this work, about half a million astronomical objects. We want to know how well the (more costly to determine) spectrometric redshifts can predict the (more easily obtained) photometric redshifts, i.e. we seek to regress the spectrometric on the photometric redshifts, and we use clusterwi...
Hierarchical matrices algorithms and analysis
Hackbusch, Wolfgang
2015-01-01
This self-contained monograph presents matrix algorithms and their analysis. The new technique enables not only the solution of linear systems but also the approximation of matrix functions, e.g., the matrix exponential. Other applications include the solution of matrix equations, e.g., the Lyapunov or Riccati equation. The required mathematical background can be found in the appendix. The numerical treatment of fully populated large-scale matrices is usually rather costly. However, the technique of hierarchical matrices makes it possible to store matrices and to perform matrix operations approximately with almost linear cost and a controllable degree of approximation error. For important classes of matrices, the computational cost increases only logarithmically with the approximation error. The operations provided include the matrix inversion and LU decomposition. Since large-scale linear algebra problems are standard in scientific computing, the subject of hierarchical matrices is of interest to scientists ...
Galaxy formation through hierarchical clustering
White, Simon D. M.; Frenk, Carlos S.
1991-01-01
Analytic methods for studying the formation of galaxies by gas condensation within massive dark halos are presented. The present scheme applies to cosmogonies where structure grows through hierarchical clustering of a mixture of gas and dissipationless dark matter. The simplest models consistent with the current understanding of N-body work on dissipationless clustering, and that of numerical and analytic work on gas evolution and cooling are adopted. Standard models for the evolution of the stellar population are also employed, and new models for the way star formation heats and enriches the surrounding gas are constructed. Detailed results are presented for a cold dark matter universe with Omega = 1 and H(0) = 50 km/s/Mpc, but the present methods are applicable to other models. The present luminosity functions contain significantly more faint galaxies than are observed.
Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.
Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K
2013-03-01
Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.
MultiDendrograms: Variable-Group Agglomerative Hierarchical Clustering
Gomez, Sergio; Montiel, Justo; Torres, David
2012-01-01
MultiDendrograms is a Java-written application that computes agglomerative hierarchical clusterings of data. Starting from a distances (or weights) matrix, MultiDendrograms is able to calculate its dendrograms using the most common agglomerative hierarchical clustering methods. The application implements a variable-group algorithm that solves the non-uniqueness problem found in the standard pair-group algorithm. This problem arises when two or more minimum distances between different clusters are equal during the agglomerative process, because then different output clusterings are possible depending on the criterion used to break ties between distances. MultiDendrograms solves this problem implementing a variable-group algorithm that groups more than two clusters at the same time when ties occur.
Partitional clustering algorithms
2015-01-01
This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...
Hierarchical Approach in Clustering to Euclidean Traveling Salesman Problem
Fajar, Abdulah; Herman, Nanna Suryana; Abu, Nur Azman; Shahib, Sahrin
There has been growing interest in studying combinatorial optimization problems by clustering strategy, with a special emphasis on the traveling salesman problem (TSP). TSP naturally arises as a sub problem in much transportation, manufacturing and logistics application, this problem has caught much attention of mathematicians and computer scientists. A clustering approach will decompose TSP into sub graph and form cluster, so it may reduce problem size into smaller problem. Impact of hierarchical approach will be investigated to produce a better clustering strategy that fit into Euclidean TSP. Clustering strategy to Euclidean TSP consist of two main step, there are; clustering and tour construction. The significant of this research is clustering approach solution result has error less than 10% compare to best known solution (TSPLIB) and there is improvement to a hierarchical clustering algorithm in order to fit in such Euclidean TSP solution method.
Institute of Scientific and Technical Information of China (English)
苏腾飞; 孟俊敏; 张晰
2013-01-01
图像分割是SAR溢油检测中的关键步骤,但由于SAR影像中存在斑点噪声,使得一般的图像分割算法难以收到理想的效果,严重影响溢油检测的精度.发展一种基于凝聚层次聚类(Hierarchical Agglomerative Clustering,HAC)的溢油SAR图像分割算法.该算法利用多尺度分割的思想,能够有效保持SAR影像中溢油斑块的形状特征,并能减少细碎斑块的产生.利用2010年墨西哥湾的Envisat ASAR影像开展了溢油SAR图像分割实验,并将该算法和Canny边缘检测、OTSU阈值分割、FCM分割、水平集分割等方法进行了对比.结果显示,HAC方法可以有效减少细碎斑块的产生,有助于提高SAR溢油检测的精度.%Image segmentation is a crucial stage in the SAR oil spill detection.However,the common image segmentation algorithms can hardly achieve satisfactory results due to speckle noise in the SAR images,thus affecting seriously the accuracy of oil spill detection.For this reason,an image segmentation algorithm which is based on HAC (Hierarchical Agglomerative Clustering) is developed for the oil spill SAR images.This method takes advantage of multi-resolution segmentation to maintain effectively the shape property of oil spill patches,and can reduce the formation of small patches.By using Envisat ASAR images of the Gulf of Mexico obtained in 2010,an experiment of SAR oil spill image segmentation has been conducted.Comparing with other approaches such as Canny,OTSU,FCM and Levelset,the results show that HAC can effectively reduce the producing of small patches,which is helpful to improve the accuracy of SAR oil spill detection.
Global Considerations in Hierarchical Clustering Reveal Meaningful Patterns in Data
Varshavsky, Roy; Horn, David; Linial, Michal
2008-01-01
Background A hierarchy, characterized by tree-like relationships, is a natural method of organizing data in various domains. When considering an unsupervised machine learning routine, such as clustering, a bottom-up hierarchical (BU, agglomerative) algorithm is used as a default and is often the only method applied. Methodology/Principal Findings We show that hierarchical clustering that involve global considerations, such as top-down (TD, divisive), or glocal (global-local) algorithms are better suited to reveal meaningful patterns in the data. This is demonstrated, by testing the correspondence between the results of several algorithms (TD, glocal and BU) and the correct annotations provided by experts. The correspondence was tested in multiple domains including gene expression experiments, stock trade records and functional protein families. The performance of each of the algorithms is evaluated by statistical criteria that are assigned to clusters (nodes of the hierarchy tree) based on expert-labeled data. Whereas TD algorithms perform better on global patterns, BU algorithms perform well and are advantageous when finer granularity of the data is sought. In addition, a novel TD algorithm that is based on genuine density of the data points is presented and is shown to outperform other divisive and agglomerative methods. Application of the algorithm to more than 500 protein sequences belonging to ion-channels illustrates the potential of the method for inferring overlooked functional annotations. ClustTree, a graphical Matlab toolbox for applying various hierarchical clustering algorithms and testing their quality is made available. Conclusions Although currently rarely used, global approaches, in particular, TD or glocal algorithms, should be considered in the exploratory process of clustering. In general, applying unsupervised clustering methods can leverage the quality of manually-created mapping of proteins families. As demonstrated, it can also provide
Parallel Wolff Cluster Algorithms
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods
Directory of Open Access Journals (Sweden)
Zweig Katharina A
2009-10-01
Full Text Available Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and
Image Segmentation by Hierarchical Spatial and Color Spaces Clustering
Institute of Scientific and Technical Information of China (English)
YU Wei
2005-01-01
Image segmentation, as a basic building block for many high-level image analysis problems, has attracted many research attentions over years. Existing approaches, however, are mainly focusing on the clustering analysis in the single channel information, i.e., either in color or spatial space, which may lead to unsatisfactory segmentation performance. Considering the spatial and color spaces jointly, this paper proposes a new hierarchical image segmentation algorithm, which alternately clusters the image regions in color and spatial spaces in a fine to coarse manner. Without losing the perceptual consistence, the proposed algorithm achieves the segmentation result using only very few number of colors according to user specification.
Angelic Hierarchical Planning: Optimal and Online Algorithms
2008-12-06
restrict our attention to plans in I∗(Act, s0). Definition 2. ( Parr and Russell , 1998) A plan ah∗ is hierarchically optimal iff ah∗ = argmina∈I∗(Act,s0):T...Murdock, Dan Wu, and Fusun Yaman. SHOP2: An HTN planning system. JAIR, 20:379–404, 2003. Ronald Parr and Stuart Russell . Reinforcement Learning with...Angelic Hierarchical Planning: Optimal and Online Algorithms Bhaskara Marthi Stuart J. Russell Jason Wolfe Electrical Engineering and Computer
Hierarchical clustering techniques for image database organization and summarization
Vellaikal, Asha; Kuo, C.-C. Jay
1998-10-01
This paper investigates clustering techniques as a method of organizing image databases to support popular visual management functions such as searching, browsing and navigation. Different types of hierarchical agglomerative clustering techniques are studied as a method of organizing features space as well as summarizing image groups by the selection of a few appropriate representatives. Retrieval performance using both single and multiple level hierarchies are experimented with and the algorithms show an interesting relationship between the top k correct retrievals and the number of comparisons required. Some arguments are given to support the use of such cluster-based techniques for managing distributed image databases.
Assembling hierarchical cluster solids with atomic precision.
Turkiewicz, Ari; Paley, Daniel W; Besara, Tiglet; Elbaz, Giselle; Pinkard, Andrew; Siegrist, Theo; Roy, Xavier
2014-11-12
Hierarchical solids created from the binary assembly of cobalt chalcogenide and iron oxide molecular clusters are reported. Six different molecular clusters based on the octahedral Co6E8 (E = Se or Te) and the expanded cubane Fe8O4 units are used as superatomic building blocks to construct these crystals. The formation of the solid is driven by the transfer of charge between complementary electron-donating and electron-accepting clusters in solution that crystallize as binary ionic compounds. The hierarchical structures are investigated by single-crystal X-ray diffraction, providing atomic and superatomic resolution. We report two different superstructures: a superatomic relative of the CsCl lattice type and an unusual packing arrangement based on the double-hexagonal close-packed lattice. Within these superstructures, we demonstrate various compositions and orientations of the clusters.
Managing Clustered Data Using Hierarchical Linear Modeling
Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.
2012-01-01
Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
Managing Clustered Data Using Hierarchical Linear Modeling
Warne, Russell T.; Li, Yan; McKyer, E. Lisako J.; Condie, Rachel; Diep, Cassandra S.; Murano, Peter S.
2012-01-01
Researchers in nutrition research often use cluster or multistage sampling to gather participants for their studies. These sampling methods often produce violations of the assumption of data independence that most traditional statistics share. Hierarchical linear modeling is a statistical method that can overcome violations of the independence…
An agglomerative hierarchical approach to visualization in Bayesian clustering problems.
Dawson, K J; Belkhir, K
2009-07-01
Clustering problems (including the clustering of individuals into outcrossing populations, hybrid generations, full-sib families and selfing lines) have recently received much attention in population genetics. In these clustering problems, the parameter of interest is a partition of the set of sampled individuals--the sample partition. In a fully Bayesian approach to clustering problems of this type, our knowledge about the sample partition is represented by a probability distribution on the space of possible sample partitions. As the number of possible partitions grows very rapidly with the sample size, we cannot visualize this probability distribution in its entirety, unless the sample is very small. As a solution to this visualization problem, we recommend using an agglomerative hierarchical clustering algorithm, which we call the exact linkage algorithm. This algorithm is a special case of the maximin clustering algorithm that we introduced previously. The exact linkage algorithm is now implemented in our software package PartitionView. The exact linkage algorithm takes the posterior co-assignment probabilities as input and yields as output a rooted binary tree, or more generally, a forest of such trees. Each node of this forest defines a set of individuals, and the node height is the posterior co-assignment probability of this set. This provides a useful visual representation of the uncertainty associated with the assignment of individuals to categories. It is also a useful starting point for a more detailed exploration of the posterior distribution in terms of the co-assignment probabilities.
Data clustering theory, algorithms, and applications
Gan, Guojun; Wu, Jianhong
2007-01-01
Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.
Hierarchical Control for Multiple DC Microgrids Clusters
DEFF Research Database (Denmark)
Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos;
2014-01-01
This paper presents a distributed hierarchical control framework to ensure reliable operation of dc Microgrid (MG) clusters. In this hierarchy, primary control is used to regulate the common bus voltage inside each MG locally. An adaptive droop method is proposed for this level which determines....... Another distributed policy is employed then to regulate the power flow among the MGs according to their local SOCs. The proposed distributed controllers on each MG communicate with only the neighbor MGs through a communication infrastructure. Finally, the small signal model is expanded for dc MG clusters...
A Framework for Hierarchical Clustering Based Indexing in Search Engines
Directory of Open Access Journals (Sweden)
Parul Gupta
2011-01-01
Full Text Available Granting efficient and fast accesses to the index is a key issuefor performances of Web Search Engines. In order to enhancememory utilization and favor fast query resolution, WSEs useInverted File (IF indexes that consist of an array of theposting lists where each posting list is associated with a termand contains the term as well as the identifiers of the documentscontaining the term. Since the document identifiers are stored insorted order, they can be stored as the difference between thesuccessive documents so as to reduce the size of the index. Thispaper describes a clustering algorithm that aims atpartitioning the set of documents into ordered clusters so thatthe documents within the same cluster are similar and are beingassigned the closer document identifiers. Thus the averagevalue of the differences between the successive documents willbe minimized and hence storage space would be saved. Thepaper further presents the extension of this clustering algorithmto be applied for the hierarchical clustering in which similarclusters are clubbed to form a mega cluster and similar megaclusters are then combined to form super cluster. Thus thepaper describes the different levels of clustering whichoptimizes the search process by directing the searchto a specific path from higher levels of clustering to the lowerlevels i.e. from super clusters to mega clusters, then to clustersand finally to the individual documents so that the user gets thebest possible matching results in minimum possible time.
A dynamic hierarchical clustering method for trajectory-based unusual video event detection.
Jiang, Fan; Wu, Ying; Katsaggelos, Aggelos K
2009-04-01
The proposed unusual video event detection method is based on unsupervised clustering of object trajectories, which are modeled by hidden Markov models (HMM). The novelty of the method includes a dynamic hierarchical process incorporated in the trajectory clustering algorithm to prevent model overfitting and a 2-depth greedy search strategy for efficient clustering.
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS
Directory of Open Access Journals (Sweden)
Nusa Erman
2015-01-01
Full Text Available A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method, and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups. We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering, the maximum (complete linkage clustering, the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method in cases when data is assumed to be fairly compact and well separated.
A fast quad-tree based two dimensional hierarchical clustering.
Rajadurai, Priscilla; Sankaranarayanan, Swamynathan
2012-01-01
Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.
Multi-mode clustering model for hierarchical wireless sensor networks
Hu, Xiangdong; Li, Yongfu; Xu, Huifen
2017-03-01
The topology management, i.e., clusters maintenance, of wireless sensor networks (WSNs) is still a challenge due to its numerous nodes, diverse application scenarios and limited resources as well as complex dynamics. To address this issue, a multi-mode clustering model (M2 CM) is proposed to maintain the clusters for hierarchical WSNs in this study. In particular, unlike the traditional time-trigger model based on the whole-network and periodic style, the M2 CM is proposed based on the local and event-trigger operations. In addition, an adaptive local maintenance algorithm is designed for the broken clusters in the WSNs using the spatial-temporal demand changes accordingly. Numerical experiments are performed using the NS2 network simulation platform. Results validate the effectiveness of the proposed model with respect to the network maintenance costs, node energy consumption and transmitted data as well as the network lifetime.
Constructing storyboards based on hierarchical clustering analysis
Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu
2005-07-01
There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.
Technique for fast and efficient hierarchical clustering
Stork, Christopher
2013-10-08
A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.
Magnetic susceptibilities of cluster-hierarchical models
McKay, Susan R.; Berker, A. Nihat
1984-02-01
The exact magnetic susceptibilities of hierarchical models are calculated near and away from criticality, in both the ordered and disordered phases. The mechanism and phenomenology are discussed for models with susceptibilities that are physically sensible, e.g., nondivergent away from criticality. Such models are found based upon the Niemeijer-van Leeuwen cluster renormalization. A recursion-matrix method is presented for the renormalization-group evaluation of response functions. Diagonalization of this matrix at fixed points provides simple criteria for well-behaved densities and response functions.
Recovery Rate of Clustering Algorithms
Li, Fajie; Klette, Reinhard; Wada, T; Huang, F; Lin, S
2009-01-01
This article provides a simple and general way for defining the recovery rate of clustering algorithms using a given family of old clusters for evaluating the performance of the algorithm when calculating a family of new clusters. Under the assumption of dealing with simulated data (i.e., known old
PROPOSED A HETEROGENEOUS CLUSTERING ALGORITHM TO IMPROVE QOS IN WSN
Directory of Open Access Journals (Sweden)
Mehran Mokhtari
2016-07-01
Full Text Available In this article it has presented leach extended hierarchical 3-level clustered heterogeneous and dynamics algorithm. On suggested protocol (LEH3LA with planning of selected auction cluster head, and alternative cluster head node, problem of delay on processing, processing of selecting members, decrease of expenses, and energy consumption, decrease of sending message, and receiving messages inside the clusters, selecting of cluster heads in large sensor networks were solved. This algorithm uses hierarchical heterogeneous network (3-levels, collective intelligence, and intra-cluster interaction for communications. Also it will solve the problems of sending data in Multi-BS mobile networks, expanding inter-cluster networks, overlap cluster, genesis orphan nodes, boundary change dynamically clusters, using backbone networks, cloud sensor. Using sleep/wake scheduling algorithm or TDMA-schedule alternative cluster head node provides redundancy, and fault tolerance. Local processing in cluster head nodes, and alternative cluster head, intra-cluster and inter-cluster communications such as Multi-HOP cause increase on processing speed, and sending data intra-cluster and inter-cluster. Decrease of overhead network, and increase the load balancing among cluster heads. Using encapsulation of data method, by cluster head nodes, energy consumption decrease during sending data. Also by improving quality of service (QoS in CBRP, LEACH, 802.15.4, decrease of energy consumption in sensors, cluster heads and alternative cluster head nodes, cause increase on lift time of sensor networks
Energy Aware Clustering Algorithms for Wireless Sensor Networks
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
Institute of Scientific and Technical Information of China (English)
康茜; 李德玉; 王素格; 冀庆斌
2015-01-01
社区发现是社会网络分析的一个基本任务，而社区结构探测是社区发现的一个关键问题。将社区结构中的结点看作信号源，针对信号传递过程中存在信号缺失情况，提出了一种层次聚类社区发现算法。该算法通过度中心性来度量节点接收信号的概率，用于量化节点接受信号过程中的缺失值。经过信号传递，使网络的拓扑结构转化为向量间的几何关系，在此基础上，使用层次聚类算法用于发现社区。为了验证SMHC算法的有效性，通过在三个数据集上与SHC算法、CNM算法、GN算法、Similar算法进行比较，实验结果表明，SMHC算法在一定程度上提高了社区发现的正确率。%Community identification is a basic task of social network analysis, meanwhile the community structure detec-tion is a key problem of community identification. Each node in the community structure is regarded as the signal source. A hierarchical clustering community algorithm is proposed in order to settle the problem of signal missing in the process of signal transmission. The algorithm measures the probability of receiving signals of nodes by degree centrality to quantify the signal missing values. After the signal transmission, the topology of the network is transformed into geometric relation-ships among the vectors. On the basis, the hierarchical clustering algorithm is used to find the community structure. In order to validate the proposed method, this paper compares it with SHC algorithm, CNM algorithm, GN algorithm and Similar algorithm. Under three real networks, the Zachary Club, American Football and Netscience, the experimental results indi-cate that SMHC algorithm can effectively improve precision.
Efficient scalable algorithms for hierarchically semiseparable matrices
Energy Technology Data Exchange (ETDEWEB)
Wang, Shen; Xia, Jianlin; Situ, Yingchong; Hoop, Maarten V. de
2011-09-14
Hierarchically semiseparable (HSS) matrix algorithms are emerging techniques in constructing the superfast direct solvers for both dense and sparse linear systems. Here, we develope a set of novel parallel algorithms for the key HSS operations that are used for solving large linear systems. These include the parallel rank-revealing QR factorization, the HSS constructions with hierarchical compression, the ULV HSS factorization, and the HSS solutions. The HSS tree based parallelism is fully exploited at the coarse level. The BLACS and ScaLAPACK libraries are used to facilitate the parallel dense kernel operations at the ne-grained level. We have appplied our new parallel HSS-embedded multifrontal solver to the anisotropic Helmholtz equations for seismic imaging, and were able to solve a linear system with 6.4 billion unknowns using 4096 processors, in about 20 minutes. The classical multifrontal solver simply failed due to high demand of memory. To our knowledge, this is the first successful demonstration of employing the HSS algorithms in solving the truly large-scale real-world problems. Our parallel strategies can be easily adapted to the parallelization of the other rank structured methods.
Hand Tracking based on Hierarchical Clustering of Range Data
Cespi, Roberto; Lindner, Marvin
2011-01-01
Fast and robust hand segmentation and tracking is an essential basis for gesture recognition and thus an important component for contact-less human-computer interaction (HCI). Hand gesture recognition based on 2D video data has been intensively investigated. However, in practical scenarios purely intensity based approaches suffer from uncontrollable environmental conditions like cluttered background colors. In this paper we present a real-time hand segmentation and tracking algorithm using Time-of-Flight (ToF) range cameras and intensity data. The intensity and range information is fused into one pixel value, representing its combined intensity-depth homogeneity. The scene is hierarchically clustered using a GPU based parallel merging algorithm, allowing a robust identification of both hands even for inhomogeneous backgrounds. After the detection, both hands are tracked on the CPU. Our tracking algorithm can cope with the situation that one hand is temporarily covered by the other hand.
Functional Clustering Algorithm for High-Dimensional Proteomics Data
Directory of Open Access Journals (Sweden)
Halima Bensmail
2005-01-01
Full Text Available Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is largely smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of features of variables (here the number of peaks is needed. An innovative hierarchical clustering algorithm may be a good approach. We propose here a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. We present a specific application of functional data analysis (FDA to a high-throughput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and human T-cell leukemia virus type 1 (HTLV-1-infected patients samples.
A Survey of Grid Based Clustering Algorithms
Directory of Open Access Journals (Sweden)
MR ILANGO
2010-08-01
Full Text Available Cluster Analysis, an automatic process to find similar objects from a database, is a fundamental operation in data mining. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering techniques have been discussed extensively in SimilaritySearch, Segmentation, Statistics, Machine Learning, Trend Analysis, Pattern Recognition and Classification [1]. Clustering methods can be classified into i Partitioning methods ii Hierarchical methods iii Density-based methods iv Grid-based methods v Model-based methods. Grid based methods quantize the object space into a finite number of cells (hyper-rectangles and then perform the required operations on the quantized space. The main advantage of Grid based method is its fast processing time which depends on number of cells in each dimension in quantized space. In this research paper, we present some of the grid based methods such as CLIQUE (CLustering In QUEst [2], STING (STatistical INformation Grid [3], MAFIA (Merging of Adaptive Intervals Approach to Spatial Data Mining [4], Wave Cluster [5]and O-CLUSTER (Orthogonal partitioning CLUSTERing [6], as a survey andalso compare their effectiveness in clustering data objects. We also present some of the latest developments in Grid Based methods such as Axis Shifted Grid Clustering Algorithm [7] and Adaptive Mesh Refinement [Wei-Keng Liao etc] [8] to improve the processing time of objects.
A Hierarchical Clustering Methodology for the Estimation of Toxicity
A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...
Hierarchical Cluster Assembly in Globally Collapsing Clouds
Vazquez-Semadeni, Enrique; Colin, Pedro
2016-01-01
We discuss the mechanism of cluster formation in a numerical simulation of a molecular cloud (MC) undergoing global hierarchical collapse (GHC). The global nature of the collapse implies that the SFR increases over time. The hierarchical nature of the collapse consists of small-scale collapses within larger-scale ones. The large-scale collapses culminate a few Myr later than the small-scale ones and consist of filamentary flows that accrete onto massive central clumps. The small-scale collapses form clumps that are embedded in the filaments and falling onto the large-scale collapse centers. The stars formed in the early, small-scale collapses share the infall motion of their parent clumps. Thus, the filaments feed both gaseous and stellar material to the massive central clump. This leads to the presence of a few older stars in a region where new protostars are forming, and also to a self-similar structure, in which each unit is composed of smaller-scale sub-units that approach each other and may merge. Becaus...
Hierarchical clustering using correlation metric and spatial continuity constraint
Stork, Christopher L.; Brewer, Luke N.
2012-10-02
Large data sets are analyzed by hierarchical clustering using correlation as a similarity measure. This provides results that are superior to those obtained using a Euclidean distance similarity measure. A spatial continuity constraint may be applied in hierarchical clustering analysis of images.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm
Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian
2017-03-01
DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
Data clustering algorithms and applications
Aggarwal, Charu C
2013-01-01
Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea
Exploiting Homogeneity of Density in Incremental Hierarchical Clustering
Directory of Open Access Journals (Sweden)
Dwi H. Widiyantoro
2006-11-01
Full Text Available Hierarchical clustering is an important tool in many applications. As it involves a large data set that proliferates over time, reclustering the data set periodically is not an efficient process. Therefore, the ability to incorporate a new data set incrementally into an existing hierarchy becomes increasingly demanding. This article describes Homogen, a system that employs a new algorithm for generating a hierarchy of concepts and clusters incrementally from a stream of observations. The system aims to construct a hierarchy that satisfies the homogeneity and the monotonicity properties. Working in a bottom-up fashion, a new observation is placed in the hierarchy and a sequence of hierarchy restructuring processes is performed only in regions that have been affected by the presence of the new observation. Additionally, it combines multiple restructuring techniques that address different restructuring objectives to get a synergistic effect. The system has been tested on a variety of domains including structured and unstructured data sets. The experimental results reveal that the system is able to construct a concept hierarchy that is consistent regardless of the input data order and whose quality is comparable to the quality of those produced by non incremental clustering algorithms.
Kernel Generalized Noise Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
WU Xiao-hong; ZHOU Jian-jiang
2007-01-01
To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and the GNC model which are based on Euclidean distance, the presented model is based on kernel-induced distance by using kernel method. By kernel method the input data are nonlinearly and implicitly mapped into a high-dimensional feature space, where the nonlinear pattern appears linear and the GNC algorithm is performed. It is unnecessary to calculate in high-dimensional feature space because the kernel function can do itjust in input space. The effectiveness of the proposed algorithm is verified by experiments on three data sets. It is concluded that the KGNC algorithm has better clustering accuracy than FCM and GNC in clustering data sets containing noisy data.
Cluster Synchronization Algorithms
Xia, Weiguo; Cao, Ming
2010-01-01
This paper presents two approaches to achieving cluster synchronization in dynamical multi-agent systems. In contrast to the widely studied synchronization behavior, where all the coupled agents converge to the same value asymptotically, in the cluster synchronization problem studied in this paper,
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
Arampatzis, Giorgos; Katsoulakis, Markos A.; Plechac, Petr; Taufer, Michela; Xu, Lifan
2011-01-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. The algorithms can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processin...
Extended Fuzzy Clustering Algorithms
U. Kaymak (Uzay); M. Setnes
2000-01-01
textabstractFuzzy clustering is a widely applied method for obtaining fuzzy models from data. It has been applied successfully in various fields including finance and marketing. Despite the successful applications, there are a number of issues that must be dealt with in practical applications of fuz
Constructing Product Ontologies with an Improved Conceptual Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
曹大军; 徐良贤
2002-01-01
In a distributed eMarketplace, recommended product ontologies are required for trading between buyers and sellers. Conceptual clustering can be employed to build dynamic recommended product ontologies. Traditional methods of conceptual clustering (e. g. COBWEB or Cluster/2) do not take heterogeneous attributes of a concept into account.Moreover, the result of these methods is clusters other than recommended concepts. A center recommendation clustering algorithm is provided. According to the values of heterogeneous attributes, recommended product names can be selected at the clusters, which are produced by this algorithm. This algorithm can also create the hierarchical relations between product names. The definitions of product names given by all participants are collected in a distributed eMarketplace.Recommended product ontologies are built. These ontologies include relations and definitions of product names, which come from different participants in the distributed eMarketplace. Finally a case is given to illustrate this method. The result shows that this method is feasible.
Hierarchical Clustering and the Concept of Space Distortion.
Hubert, Lawrence; Schultz, James
An empirical assesssment of the space distortion properties of two prototypic hierarchical clustering procedures is given in terms of an occupancy model developed from combinatorics. Using one simple example, the single-link and complete-link clustering strategies now in common use in the behavioral sciences are empirically shown to be space…
Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)
2008-01-01
The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.
The Hierarchical Distribution of Young Stellar Clusters in Nearby Galaxies
Grasha, Kathryn; Calzetti, Daniela
2017-01-01
We investigate the spatial distributions of young stellar clusters in six nearby galaxies to trace the large scale hierarchical star-forming structures. The six galaxies are drawn from the Legacy ExtraGalactic UV Survey (LEGUS). We quantify the strength of the clustering among stellar clusters as a function of spatial scale and age to establish the survival timescale of the substructures. We separate the clusters into different classes, compact (bound) clusters and associations (unbound), and compare the clustering among them. We find that younger star clusters are more strongly clustered over small spatial scales and that the clustering disappears rapidly for ages as young as a few tens of Myr, consistent with clusters slowly losing the fractal dimension inherited at birth from their natal molecular clouds.
A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.
Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing
2017-01-01
An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.
A Framework for Analyzing Software Quality using Hierarchical Clustering
Directory of Open Access Journals (Sweden)
Arashdeep Kaur
2011-02-01
Full Text Available Fault proneness data available in the early software life cycle from previous releases or similar kind of projects will aid in improving software quality estimations. Various techniques have been proposed in the literature which includes statistical method, machine learning methods, neural network techniques and clustering techniques for the prediction of faulty and non faulty modules in the project. In this study, Hierarchical clustering algorithm is being trained and tested with lifecycle data collected from NASA projects namely, CM1, PC1 and JM1 as predictive models. These predictive models contain requirement metrics and static code metrics. We have combined requirement metric model with static code metric model to get fusion metric model. Further we have investigated that which of the three prediction models is found to be the best prediction model on the basis of fault detection. The basic hypothesis of software quality estimation is that automatic quality prediction models enable verificationexperts to concentrate their attention and resources at problem areas of the system under development. The proposed approach has been implemented in MATLAB 7.4. The results show that when all the prediction techniques are evaluated, the best prediction model is found to be the fusion metric model. This proposed model is also compared with other quality models available in the literature and is found to be efficient for predicting faulty modules.
Hierarchical Clustering Given Confidence Intervals of Metric Distances
Huang, Weiyu
2016-01-01
This paper considers metric spaces where distances between a pair of nodes are represented by distance intervals. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a resolution parameter, induced from the given distance intervals of the metric spaces. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide to the axioms of value - nodes in a metric space with two nodes are clustered together at the convex combination of the distance bounds between them - and transformation - when both distance bounds are reduced, the output may become more clustered but not less. Two admissible methods are constructed and are shown to provide universal upper and lower bounds in the space of admissible methods. Practical implications are explored by clustering moving points via snapshots and by clustering networks representing brain structural connectivity using the lower and upper bounds...
Hierarchical modeling of cluster size in wildlife surveys
Royle, J. Andrew
2008-01-01
Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).
Update Legal Documents Using Hierarchical Ranking Models and Word Clustering
Pham, Minh Quang Nhat; Nguyen, Minh Le; Shimazu, Akira
2010-01-01
Our research addresses the task of updating legal documents when newinformation emerges. In this paper, we employ a hierarchical ranking model tothe task of updating legal documents. Word clustering features are incorporatedto the ranking models to exploit semantic relations between words. Experimentalresults on legal data built from the United States Code show that the hierarchicalranking model with word clustering outperforms baseline methods using VectorSpace Model, and word cluster-based ...
NHRPA: a novel hierarchical routing protocol algorithm for wireless sensor networks
Institute of Scientific and Technical Information of China (English)
CHENG Hong-bing; YANG Geng; HU Su-jun
2008-01-01
Considering severe resources constraints and security threat of wireless sensor networks (WSN), the article proposed a novel hierarchical routing protocol algorithm. The proposed routing protocol algorithm can adopt suitable routing technology for the nodes according to the distance of nodes to the base station, density of nodes distribution, and residual energy of nodes. Comparing the proposed routing protocol algorithm with simple direction diffusion routing technology, cluster-based routing mechanisms, and simple hierarchical routing protocol algorithm through comprehensive analysis and simulation in terms of the energy usage, packet latency, and security in the presence of node compromise attacks, the results show that the proposed routing protocol algorithm is more efficient for wireless sensor networks.
Hierarchical Overlapping Clustering of Network Data Using Cut Metrics
Gama, Fernando; Ribeiro, Alejandro
2016-01-01
A novel method to obtain hierarchical and overlapping clusters from network data -i.e., a set of nodes endowed with pairwise dissimilarities- is presented. The introduced method is hierarchical in the sense that it outputs a nested collection of groupings of the node set depending on the resolution or degree of similarity desired, and it is overlapping since it allows nodes to belong to more than one group. Our construction is rooted on the facts that a hierarchical (non-overlapping) clustering of a network can be equivalently represented by a finite ultrametric space and that a convex combination of ultrametrics results in a cut metric. By applying a hierarchical (non-overlapping) clustering method to multiple dithered versions of a given network and then convexly combining the resulting ultrametrics, we obtain a cut metric associated to the network of interest. We then show how to extract a hierarchical overlapping clustering structure from the aforementioned cut metric. Furthermore, the so-called overlappi...
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Determination of atomic cluster structure with cluster fusion algorithm
DEFF Research Database (Denmark)
Obolensky, Oleg I.; Solov'yov, Ilia; Solov'yov, Andrey V.
2005-01-01
We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters.......We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters....
Particle identification using clustering algorithms
Wirth, R; Löher, B; Savran, D; Silva, J; Pol, H Álvarez; Gil, D Cortina; Pietras, B; Bloch, T; Kröll, T; Nácher, E; Perea, Á; Tengblad, O; Bendel, M; Dierigl, M; Gernhäuser, R; Bleis, T Le; Winkel, M
2013-01-01
A method that uses fuzzy clustering algorithms to achieve particle identification based on pulse shape analysis is presented. The fuzzy c-means clustering algorithm is used to compute mean (principal) pulse shapes induced by different particle species in an automatic and unsupervised fashion from a mixed set of data. A discrimination amplitude is proposed using these principal pulse shapes to identify the originating particle species of a detector pulse. Since this method does not make any assumptions about the specific features of the pulse shapes, it is very generic and suitable for multiple types of detectors. The method is applied to discriminate between photon- and proton-induced signals in CsI(Tl) scintillator detectors and the results are compared to the well-known integration method.
Bayesian hierarchical clustering for studying cancer gene expression data with unknown statistics.
Directory of Open Access Journals (Sweden)
Korsuk Sirinukunwattana
Full Text Available Clustering analysis is an important tool in studying gene expression data. The Bayesian hierarchical clustering (BHC algorithm can automatically infer the number of clusters and uses Bayesian model selection to improve clustering quality. In this paper, we present an extension of the BHC algorithm. Our Gaussian BHC (GBHC algorithm represents data as a mixture of Gaussian distributions. It uses normal-gamma distribution as a conjugate prior on the mean and precision of each of the Gaussian components. We tested GBHC over 11 cancer and 3 synthetic datasets. The results on cancer datasets show that in sample clustering, GBHC on average produces a clustering partition that is more concordant with the ground truth than those obtained from other commonly used algorithms. Furthermore, GBHC frequently infers the number of clusters that is often close to the ground truth. In gene clustering, GBHC also produces a clustering partition that is more biologically plausible than several other state-of-the-art methods. This suggests GBHC as an alternative tool for studying gene expression data. The implementation of GBHC is available at https://sites.google.com/site/gaussianbhc/
Properties of hierarchically forming star clusters
Maschberger, Th; Bonnell, I A; Kroupa, P
2010-01-01
We undertake a systematic analysis of the early (< 0.5 Myr) evolution of clustering and the stellar initial mass function in turbulent fragmentation simulations. These large scale simulations for the first time offer the opportunity for a statistical analysis of IMF variations and correlations between stellar properties and cluster richness. The typical evolutionary scenario involves star formation in small-n clusters which then progressively merge; the first stars to form are seeds of massive stars and achieve a headstart in mass acquisition. These massive seeds end up in the cores of clusters and a large fraction of new stars of lower mass is formed in the outer parts of the clusters. The resulting clusters are therefore mass segregated at an age of 0.5 Myr, although the signature of mass segregation is weakened during mergers. We find that the resulting IMF has a smaller exponent (alpha=1.8-2.2) than the Salpeter value (alpha=2.35). The IMFs in subclusters are truncated at masses only somewhat larger th...
An Improved Weighted Clustering Algorithm in MANET
Institute of Scientific and Technical Information of China (English)
WANG Jin; XU Li; ZHENG Bao-yu
2004-01-01
The original clustering algorithms in Mobile Ad hoc Network (MANET) are firstly analyzed in this paper.Based on which, an Improved Weighted Clustering Algorithm (IWCA) is proposed. Then, the principle and steps of our algorithm are explained in detail, and a comparison is made between the original algorithms and our improved method in the aspects of average cluster number, topology stability, clusterhead load balance and network lifetime. The experimental results show that our improved algorithm has the best performance on average.
Kernel method-based fuzzy clustering algorithm
Institute of Scientific and Technical Information of China (English)
Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping
2005-01-01
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
Hierarchical clusters of phytoplankton variables in dammed water bodies
Silva, Eliana Costa e.; Lopes, Isabel Cristina; Correia, Aldina; Gonçalves, A. Manuela
2017-06-01
In this paper a dataset containing biological variables of the water column of several Portuguese reservoirs is analyzed. Hierarchical cluster analysis is used to obtain clusters of phytoplankton variables of the phylum Cyanophyta, with the objective of validating the classification of Portuguese reservoirs previewly presented in [1] which were divided into three clusters: (1) Interior Tagus and Aguieira; (2) Douro; and (3) Other rivers. Now three new clusters of Cyanophyta variables were found. Kruskal-Wallis and Mann-Whitney tests are used to compare the now obtained Cyanophyta clusters and the previous Reservoirs clusters, in order to validate the classification of the water quality of reservoirs. The amount of Cyanophyta algae present in the reservoirs from the three clusters is significantly different, which validates the previous classification.
A new cluster algorithm for graphs
Dongen, S. van
1998-01-01
A new cluster algorithm for graphs called the emph{Markov Cluster algorithm ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with
Multilayer Traffic Network Optimized by Multiobjective Genetic Clustering Algorithm
Wen, Feng; Gen, Mitsuo; Yu, Xinjie
This paper introduces a multilayer traffic network model and traffic network clustering method for solving the route selection problem (RSP) in car navigation system (CNS). The purpose of the proposed method is to reduce the computation time of route selection substantially with acceptable loss of accuracy by preprocessing the large size traffic network into new network form. The proposed approach further preprocesses the traffic network than the traditional hierarchical network method by clustering method. The traffic network clustering considers two criteria. We specify a genetic clustering algorithm for traffic network clustering and use NSGA-II for calculating the multiple objective Pareto optimal set. The proposed method can overcome the size limitations when solving route selection in CNS. Solutions provided by the proposed algorithm are compared with the optimal solutions to analyze and quantify the loss of accuracy.
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
DEFF Research Database (Denmark)
Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte
2006-01-01
Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset....... The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering...
Hierarchical Clustering of Large Databases and Classification of Antibiotics at High Noise Levels
Directory of Open Access Journals (Sweden)
Alexander V. Yarkov
2008-12-01
Full Text Available A new algorithm for divisive hierarchical clustering of chemical compounds based on 2D structural fragments is suggested. The algorithm is deterministic, and given a random ordering of the input, will always give the same clustering and can process a database up to 2 million records on a standard PC. The algorithm was used for classification of 1,183 antibiotics mixed with 999,994 random chemical structures. Similarity threshold, at which best separation of active and non active compounds took place, was estimated as 0.6. 85.7% of the antibiotics were successfully classified at this threshold with 0.4% of inaccurate compounds. A .sdf file was created with the probe molecules for clustering of external databases.
A supplier selection using a hybrid grey based hierarchical clustering and artificial bee colony
Directory of Open Access Journals (Sweden)
Farshad Faezy Razi
2014-06-01
Full Text Available Selection of one or a combination of the most suitable potential providers and outsourcing problem is the most important strategies in logistics and supply chain management. In this paper, selection of an optimal combination of suppliers in inventory and supply chain management are studied and analyzed via multiple attribute decision making approach, data mining and evolutionary optimization algorithms. For supplier selection in supply chain, hierarchical clustering according to the studied indexes first clusters suppliers. Then, according to its cluster, each supplier is evaluated through Grey Relational Analysis. Then the combination of suppliers’ Pareto optimal rank and costs are obtained using Artificial Bee Colony meta-heuristic algorithm. A case study is conducted for a better description of a new algorithm to select a multiple source of suppliers.
A Scalable Clustering Algorithm in Dense Mobile Sensor Networks
Directory of Open Access Journals (Sweden)
Jianbo Li
2011-03-01
Full Text Available Clustering offers a kind of hierarchical organization to provide scalability and basic performance guarantee by partitioning the network into disjoint groups of nodes. In this paper a scalable and energy efficient clustering algorithm is proposed under dense mobile sensor networks scenario. In the initial cluster formation phase, our proposed scheme features a simple execution process with polynomial time complexity, and eliminates the “frozen time” requirement by introducing some GPS-capable mobile nodes to act as cluster heads. In the following cluster maintenance stage, the maintenance of clusters is asynchronously and event driven so as to thoroughly eliminate the “ripple effect” brought by node mobility. As a result local changes in a cluster need not be seen and updated by the entire network, thus bringing greatly reduced communication overheads and being well suitable for the high mobility environment. Extensive simulations have been conducted and the simulation results reveal that our proposed algorithm successfully achieves its target at incurring much less clustering overheads as well as maintaining much more stable cluster structure, as compared to HCC(High Connectivity Clustering algorithm
Eriksson, Brian; Singh, Aarti; Nowak, Robert
2011-01-01
Hierarchical clustering based on pairwise similarities is a common tool used in a broad range of scientific applications. However, in many problems it may be expensive to obtain or compute similarities between the items to be clustered. This paper investigates the hierarchical clustering of N items based on a small subset of pairwise similarities, significantly less than the complete set of N(N-1)/2 similarities. First, we show that if the intracluster similarities exceed intercluster similarities, then it is possible to correctly determine the hierarchical clustering from as few as 3N log N similarities. We demonstrate this order of magnitude savings in the number of pairwise similarities necessitates sequentially selecting which similarities to obtain in an adaptive fashion, rather than picking them at random. We then propose an active clustering method that is robust to a limited fraction of anomalous similarities, and show how even in the presence of these noisy similarity values we can resolve the hierar...
Hierarchical Cluster Analysis – Various Approaches to Data Preparation
Directory of Open Access Journals (Sweden)
Z. Pacáková
2013-09-01
Full Text Available The article deals with two various approaches to data preparation to avoid multicollinearity. The aim of the article is to find similarities among the e-communication level of EU states using hierarchical cluster analysis. The original set of fourteen indicators was first reduced on the basis of correlation analysis while in case of high correlation indicator of higher variability was included in further analysis. Secondly the data were transformed using principal component analysis while the principal components are poorly correlated. For further analysis five principal components explaining about 92% of variance were selected. Hierarchical cluster analysis was performed both based on the reduced data set and the principal component scores. Both times three clusters were assumed following Pseudo t-Squared and Pseudo F Statistic, but the final clusters were not identical. An important characteristic to compare the two results found was to look at the proportion of variance accounted for by the clusters which was about ten percent higher for the principal component scores (57.8% compared to 47%. Therefore it can be stated, that in case of using principal component scores as an input variables for cluster analysis with explained proportion high enough (about 92% for in our analysis, the loss of information is lower compared to data reduction on the basis of correlation analysis.
Concept Association and Hierarchical Hamming Clustering Model in Text Classification
Institute of Scientific and Technical Information of China (English)
Su Gui-yang; Li Jian-hua; Ma Ying-hua; Li Sheng-hong; Yin Zhong-hang
2004-01-01
We propose two models in this paper. The concept of association model is put forward to obtain the co-occurrence relationships among keywords in the documents and the hierarchical Hamming clustering model is used to reduce the dimensionality of the category feature vector space which can solve the problem of the extremely high dimensionality of the documents' feature space. The results of experiment indicate that it can obtain the co-occurrence relations among keywords in the documents which promote the recall of classification system effectively. The hierarchical Hamming clustering model can reduce the dimensionality of the category feature vector efficiently, the size of the vector space is only about 10% of the primary dimensionality.
Frequent Pattern Mining Algorithms for Data Clustering
DEFF Research Database (Denmark)
Zimek, Arthur; Assent, Ira; Vreeken, Jilles
2014-01-01
that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...
Non-hierarchical clustering methods on factorial subspaces
Tortora, Cristina
2011-01-01
Cluster analysis (CA) aims at finding homogeneous group of individuals, where homogeneous is referred to individuals that present similar characteristics. Many CA techniques already exist, among the non-hierarchical ones the most known, thank to its simplicity and computational property, is k-means method. However, the method is unstable when the number of variables is large and when variables are correlated. This problem leads to the development of two-step methods, they perform a linear tra...
Recursive Hierarchical Image Segmentation by Region Growing and Constrained Spectral Clustering
Tilton, James C.
2002-01-01
This paper describes an algorithm for hierarchical image segmentation (referred to as HSEG) and its recursive formulation (referred to as RHSEG). The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HS WO) approach to region growing, which seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing. In addition, HSEG optionally interjects between HSWO region growing iterations merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the segmentation results, especially for larger images, it also significantly increases HSEG's computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) has been devised and is described herein. Included in this description is special code that is required to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. Implementations for single processor and for multiple processor computer systems are described. Results with Landsat TM data are included comparing HSEG with classic region growing. Finally, an application to image information mining and knowledge discovery is discussed.
Introduction to Cluster Monte Carlo Algorithms
Luijten, E.
This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster algorithm of Dress and Krauth is described. It is shown how their geometric approach can be generalized to incorporate particle interactions beyond hardcore repulsions, thus forging a connection between the lattice and continuum approaches. Several illustrative examples are discussed.
Extending stability through hierarchical clusters in Echo State Networks
Directory of Open Access Journals (Sweden)
Sarah Jarvis
2010-07-01
Full Text Available Echo State Networks (ESN are reservoir networks that satisfy well-established criteria for stability when constructed as feedforward networks. Recent evidence suggests that stability criteria are altered in the presence of reservoir substructures, such as clusters. Understanding how the reservoir architecture affects stability is thus important for the appropriate design of any ESN. To quantitatively determine the influence of the most relevant network parameters, we analysed the impact of reservoir substructures on stability in hierarchically clustered ESNs (HESN, as they allow a smooth transition from highly structured to increasingly homogeneous reservoirs. Previous studies used the largest eigenvalue of the reservoir connectivity matrix (spectral radius as a predictor for stable network dynamics. Here, we evaluate the impact of clusters, hierarchy and intercluster connectivity on the predictive power of the spectral radius for stability. Both hierarchy and low relative cluster sizes extend the range of spectral radius values, leading to stable networks, while increasing intercluster connectivity decreased maximal spectral radius.
A Novel Research on Rough Clustering Algorithm
Directory of Open Access Journals (Sweden)
Tao Qu
2014-01-01
Full Text Available The aim of this study is focusing the issue of traditional clustering algorithm subjects to data space distribution influence, a novel clustering algortihm combined with rough set theory is employed to the normal clustering. The proposed rough clustering algorithm takes the condition attributes and decision attributes displayed in the information table as the consistency principle, meanwhile it takes the data supercubic and information entropy to realize data attribute shortcutting and discretizing. Based on above discussion, by applying assemble feature vector addition principle computiation only one scanning information table can realize clustering for the data subject. Experiments reveal that the proposed algorithm is efficient and feasible.
Xu, Lizhen; Paterson, Andrew D; Xu, Wei
2017-04-01
Motivated by the multivariate nature of microbiome data with hierarchical taxonomic clusters, counts that are often skewed and zero inflated, and repeated measures, we propose a Bayesian latent variable methodology to jointly model multiple operational taxonomic units within a single taxonomic cluster. This novel method can incorporate both negative binomial and zero-inflated negative binomial responses, and can account for serial and familial correlations. We develop a Markov chain Monte Carlo algorithm that is built on a data augmentation scheme using Pólya-Gamma random variables. Hierarchical centering and parameter expansion techniques are also used to improve the convergence of the Markov chain. We evaluate the performance of our proposed method through extensive simulations. We also apply our method to a human microbiome study.
Impact of hierarchical memory systems on linear algebra algorithm design
Energy Technology Data Exchange (ETDEWEB)
Gallivan, K.; Jalby, W.; Meier, U.; Sameh, A.H.
1988-01-01
Linear algebra algorithms based on the BLAS or extended BLAS do not achieve high performance on multivector processors with a hierarchical memory system because of a lack of data locality. For such machines, block linear algebra algorithms must be implemented in terms of matrix-matrix primitives (BLAS3). Designing efficient linear algebra algorithms for these architectures requires analysis of the behavior of the matrix-matrix primitives and the resulting block algorithms as a function of certain system parameters. The analysis must identify the limits of performance improvement possible via blocking and any contradictory trends that require trade-off consideration. The authors propose a methodology that facilitates such an analysis and use it to analyze the performance of the BLAS3 primitives used in block methods. A similar analysis of the block size-performance relationship is also performed at the algorithm level for block versions of the LU decomposition and the Gram-Schmidt orthogonalization procedures.
Optimising steel production schedules via a hierarchical genetic algorithm
Directory of Open Access Journals (Sweden)
Worapradya, Kiatkajohn
2014-08-01
Full Text Available This paper presents an effective scheduling in a steel-making continuous casting (SCC plant. The main contribution of this paper is the formulation of a new optimisation model that more closely represents real-world situations, and a hierarchical genetic algorithm (HGA tailored particularly for searching for an optimal SCC schedule. The optimisation model is developed by integrating two main planning phases of traditional scheduling: (1 planning cast sequence, and (2 scheduling of steel-making and timing of all jobs. A novel procedure is given for genetic algorithm (GA chromosome coding that maps Gantt chart and hierarchical chromosomes. The performance of the proposed methodology is illustrated and compared with a two-phase traditional scheduling and a standard GA toolbox. Both qualitative and quantitative performance measures are investigated.
Bouaziz, Matthieu; Paccard, Caroline; Guedj, Mickael; Ambroise, Christophe
2012-01-01
Inferring the structure of populations has many applications for genetic research. In addition to providing information for evolutionary studies, it can be used to account for the bias induced by population stratification in association studies. To this end, many algorithms have been proposed to cluster individuals into genetically homogeneous sub-populations. The parametric algorithms, such as Structure, are very popular but their underlying complexity and their high computational cost led to the development of faster parametric alternatives such as Admixture. Alternatives to these methods are the non-parametric approaches. Among this category, AWclust has proven efficient but fails to properly identify population structure for complex datasets. We present in this article a new clustering algorithm called Spectral Hierarchical clustering for the Inference of Population Structure (SHIPS), based on a divisive hierarchical clustering strategy, allowing a progressive investigation of population structure. This method takes genetic data as input to cluster individuals into homogeneous sub-populations and with the use of the gap statistic estimates the optimal number of such sub-populations. SHIPS was applied to a set of simulated discrete and admixed datasets and to real SNP datasets, that are data from the HapMap and Pan-Asian SNP consortium. The programs Structure, Admixture, AWclust and PCAclust were also investigated in a comparison study. SHIPS and the parametric approach Structure were the most accurate when applied to simulated datasets both in terms of individual assignments and estimation of the correct number of clusters. The analysis of the results on the real datasets highlighted that the clusterings of SHIPS were the more consistent with the population labels or those produced by the Admixture program. The performances of SHIPS when applied to SNP data, along with its relatively low computational cost and its ease of use make this method a promising
Bekki, Kenji
2017-01-01
Most old globular clusters (GCs) in the Galaxy are observed to have internal chemical abundance spreads in light elements. We discuss a new GC formation scenario based on hierarchical star formation within fractal molecular clouds. In the new scenario, a cluster of bound and unbound star clusters (`star cluster complex', SCC) that have a power-law cluster mass function with a slope (β) of 2 is first formed from a massive gas clump developed in a dwarf galaxy. Such cluster complexes and β = 2 are observed and expected from hierarchical star formation. The most massive star cluster (`main cluster'), which is the progenitor of a GC, can accrete gas ejected from asymptotic giant branch (AGB) stars initially in the cluster and other low-mass clusters before the clusters are tidally stripped or destroyed to become field stars in the dwarf. The SCC is initially embedded in a giant gas hole created by numerous supernovae of the SCC so that cold gas outside the hole can be accreted onto the main cluster later. New stars formed from the accreted gas have chemical abundances that are different from those of the original SCC. Using hydrodynamical simulations of GC formation based on this scenario, we show that the main cluster with the initial mass as large as [2 - 5] × 105M⊙ can accrete more than 105M⊙ gas from AGB stars of the SCC. We suggest that merging of hierarchical star cluster complexes can play key roles in stellar halo formation around GCs and self-enrichment processes in the early phase of GC formation.
Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms
Directory of Open Access Journals (Sweden)
Arindam Chaudhuri
2015-01-01
Full Text Available Intuitionistic fuzzy sets (IFSs provide mathematical framework based on fuzzy sets to describe vagueness in data. It finds interesting and promising applications in different domains. Here, we develop an intuitionistic fuzzy possibilistic C means (IFPCM algorithm to cluster IFSs by hybridizing concepts of FPCM, IFSs, and distance measures. IFPCM resolves inherent problems encountered with information regarding membership values of objects to each cluster by generalizing membership and nonmembership with hesitancy degree. The algorithm is extended for clustering interval valued intuitionistic fuzzy sets (IVIFSs leading to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM. The clustering algorithm has membership and nonmembership degrees as intervals. Information regarding membership and typicality degrees of samples to all clusters is given by algorithm. The experiments are performed on both real and simulated datasets. It generates valuable information and produces overlapped clusters with different membership degrees. It takes into account inherent uncertainty in information captured by IFSs. Some advantages of algorithms are simplicity, flexibility, and low computational complexity. The algorithm is evaluated through cluster validity measures. The clustering accuracy of algorithm is investigated by classification datasets with labeled patterns. The algorithm maintains appreciable performance compared to other methods in terms of pureness ratio.
Hierarchically Clustered Star Formation in the Magellanic Clouds
Gouliermis, Dimitrios A; Ossenkopf, Volker; Klessen, Ralf S; Dolphin, Andrew E
2012-01-01
We present a cluster analysis of the bright main-sequence and faint pre--main-sequence stellar populations of a field ~ 90 x 90 pc centered on the HII region NGC 346/N66 in the Small Magellanic Cloud, from imaging with HST/ACS. We extend our earlier analysis on the stellar cluster population in the region to characterize the structuring behavior of young stars in the region as a whole with the use of stellar density maps interpreted through techniques designed for the study of the ISM structuring. In particular, we demonstrate with Cartwrigth & Whitworth's Q parameter, dendrograms, and the Delta-variance wavelet transform technique that the young stellar populations in the region NGC 346/N66 are hierarchically clustered, in agreement with other regions in the Magellanic Clouds observed with HST. The origin of this hierarchy is currently under investigation.
Algorithm for Spatial Clustering with Obstacles
El-Sharkawi, Mohamed E
2009-01-01
In this paper, we propose an efficient clustering technique to solve the problem of clustering in the presence of obstacles. The proposed algorithm divides the spatial area into rectangular cells. Each cell is associated with statistical information that enables us to label the cell as dense or non-dense. We also label each cell as obstructed (i.e. intersects any obstacle) or non-obstructed. Then the algorithm finds the regions (clusters) of connected, dense, non-obstructed cells. Finally, the algorithm finds a center for each such region and returns those centers as centers of the relatively dense regions (clusters) in the spatial area.
A new fusion algorithm for fuzzy clustering
Directory of Open Access Journals (Sweden)
Ivan Vidović
2014-12-01
Full Text Available In this paper, we have considered the merging problem of two ellipsoidal clusters in order to construct a new fusion algorithm for fuzzy clustering. We have proposed a criterion for merging two ellipsoidal clusters ∏1, ∏2 with associated main Mahalanobis circles Ej(cj,σj, where cj is the centroid and σ^2j is the Mahalanobis variance of cluster ∏j . Based on the well-known Davies-Bouldin index, we have constructed a new fusion algorithm. The criterion has been tested on several data sets, and the performance of the fusion algorithm has been demonstrated on an illustrative example.
HCsnip: An R Package for Semi-supervised Snipping of the Hierarchical Clustering Tree.
Obulkasim, Askar; van de Wiel, Mark A
2015-01-01
Hierarchical clustering (HC) is one of the most frequently used methods in computational biology in the analysis of high-dimensional genomics data. Given a data set, HC outputs a binary tree leaves of which are the data points and internal nodes represent clusters of various sizes. Normally, a fixed-height cut on the HC tree is chosen, and each contiguous branch of data points below that height is considered as a separate cluster. However, the fixed-height branch cut may not be ideal in situations where one expects a complicated tree structure with nested clusters. Furthermore, due to lack of utilization of related background information in selecting the cutoff, induced clusters are often difficult to interpret. This paper describes a novel procedure that aims to automatically extract meaningful clusters from the HC tree in a semi-supervised way. The procedure is implemented in the R package HCsnip available from Bioconductor. Rather than cutting the HC tree at a fixed-height, HCsnip probes the various way of snipping, possibly at variable heights, to tease out hidden clusters ensconced deep down in the tree. The cluster extraction process utilizes, along with the data set from which the HC tree is derived, commonly available background information. Consequently, the extracted clusters are highly reproducible and robust against various sources of variations that "haunted" high-dimensional genomics data. Since the clustering process is guided by the background information, clusters are easy to interpret. Unlike existing packages, no constraint is placed on the data type on which clustering is desired. Particularly, the package accepts patient follow-up data for guiding the cluster extraction process. To our knowledge, HCsnip is the first package that is able to decomposes the HC tree into clusters with piecewise snipping under the guidance of patient time-to-event information. Our implementation of the semi-supervised HC tree snipping framework is generic, and can
Novel Cluster Validity Index for FCM Algorithm
Institute of Scientific and Technical Information of China (English)
Jian Yu; Cui-Xia Li
2006-01-01
How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality test of FCM. Unlike the previous cluster validity indices, this novel cluster validity index is inherent in FCM itself. Comparison experiments show that the stability index can be used as cluster validity index for the fuzzy c-means.
Multiscale stochastic hierarchical image segmentation by spectral clustering
Institute of Scientific and Technical Information of China (English)
LI XiaoBin; TIAN Zheng
2007-01-01
This paper proposes a sampling based hierarchical approach for solving the computational demands of the spectral clustering methods when applied to the problem of image segmentation. The authors first define the distance between a pixel and a cluster, and then derive a new theorem to estimate the number of samples needed for clustering. Finally, by introducing a scale parameter into the similarity function, a novel spectral clustering based image segmentation method has been developed. An important characteristic of the approach is that in the course of image segmentation one needs not only to tune the scale parameter to merge the small size clusters or split the large size clusters but also take samples from the data set at the different scales. The multiscale and stochastic nature makes it feasible to apply the method to very large grouping problem. In addition, it also makes the segmentation compute in time that is linear in the size of the image. The experimental results on various synthetic and real world images show the effectiveness of the approach.
An object-oriented cluster search algorithm
Energy Technology Data Exchange (ETDEWEB)
Silin, Dmitry; Patzek, Tad
2003-01-24
In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.
Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf
2017-09-01
Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.
Gillis, Nicolas
2011-01-01
Nonnegative matrix factorization (NMF) is a data analysis technique used in a great variety of applications such as text mining, image processing, hyperspectral data analysis, computational biology, and clustering. In this paper, we consider two well-known algorithms designed to solve NMF problems, namely the multiplicative updates of Lee and Seung and the hierarchical alternating least squares of Cichocki et al. We propose a simple way to significantly accelerate their convergence, based on a careful analysis of the computational cost needed at each iteration. This acceleration technique can also be applied to other algorithms, which we illustrate on the projected gradient method of Lin. The efficiency of the accelerated algorithms is empirically demonstrated on image and text datasets, and compares favorably with a state-of-the-art alternating nonnegative least squares algorithm. Finally, we provide a theoretical argument based on the properties of NMF and its solutions that explains in particular the very ...
Hierarchical data visualization using a fast rectangle-packing algorithm.
Itoh, Takayuki; Yamaguchi, Yumi; Ikehata, Yuko; Kajinaga, Yasumasa
2004-01-01
This paper presents a technique for the representation of large-scale hierarchical data which aims to provide good overviews of complete structures and the content of the data in one display space. The technique represents the data by using nested rectangles. It first packs icons or thumbnails of the lowest-level data and then generates rectangular borders that enclose the packed data. It repeats the process of generating rectangles that enclose the lower-level rectangles until the highest-level rectangles are packed. This paper presents two rectangle-packing algorithms for placing items of hierarchical data onto display spaces. The algorithms refer to Delaunay triangular meshes connecting the centers of rectangles to find gaps where rectangles can be placed. The first algorithm places rectangles where they do not overlap each other and where the extension of the layout area is minimal. The second algorithm places rectangles by referring to templates describing the ideal positions for nodes of input data. It places rectangles where they do not overlap each other and where the combination of the layout area and the distances between the positions described in the template and the actual positions is minimal. It can smoothly represent time-varying data by referring to templates that describe previous layout results. It is also suitable for semantics-based or design-based data layout by generating templates according to the semantics or design.
An extended EM algorithm for subspace clustering
Institute of Scientific and Technical Information of China (English)
Lifei CHEN; Qingshan JIANG
2008-01-01
Clustering high dimensional data has become a challenge in data mining due to the curse of dimension-ality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calcu-lates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weight-ing method, as a means to minimize a proposed cluster-ing criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the pos-sible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.
Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G
2014-09-30
This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.
Institute of Scientific and Technical Information of China (English)
YAN Haixia; ZHOU Qiang; HONG Xianlong; LI Zhuoyuan
2009-01-01
Hierarchical art was used to solve the mixed mode placement for three dimensional (3-D) inte-grated circuit design. The 3-D placement flow stream includes hierarchical clustering, hierarchical 3-D floor-planning, vertical via mapping, and recursive two dimensional (2-D) global/detailed placement phases. With state-of-the-art clustering and de-clustering phases, the design complexity was reduced to enhance the placement algorithm efficiency and capacity. The 3-D floorplanning phase solved the layer assignment problem and controlled the number of vertical vias. The vertical via mapping transformed the 3-D placement problem to a set of 2-D placement sub-problems, which not only simplifies the original 3-D placement prob-lem, but also generates the vertical via assignment solution for the routing phase. The design optimizes both the wire length and the thermal load in the floorplan and placement phases to improve the performance and reliability of 3-D integrate circuits. Experiments on IBM benchmarks show that the total wire length is reduced from 15% to 35% relative to 2-D placement with two to four stacked layers, with the number of vertical vias minimized to satisfy a pre-defined upper bound constraint. The maximum temperature is reduced by 16% with two-stage optimization on four stacked layers.
Load Balancing Algorithm for Cache Cluster
Institute of Scientific and Technical Information of China (English)
刘美华; 古志民; 曹元大
2003-01-01
By the load definition of cluster, the request is regarded as granularity to compute load and implement the load balancing in cache cluster. First, the processing power of cache-node is studied from four aspects: network bandwidth, memory capacity, disk access rate and CPU usage. Then, the weighted load of cache-node is customized. Based on this, a load-balancing algorithm that can be applied to the cache cluster is proposed. Finally, Polygraph is used as a benchmarking tool to test the cache cluster possessing the load-balancing algorithm and the cache cluster with cache array routing protocol respectively. The results show the load-balancing algorithm can improve the performance of the cache cluster.
Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
Directory of Open Access Journals (Sweden)
MUHAMMAD WASEEM KHAN
2017-01-01
Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.
Hadida, Jonathan; Desrosiers, Christian; Duong, Luc
2011-03-01
The segmentation of anatomical structures in Computed Tomography Angiography (CTA) is a pre-operative task useful in image guided surgery. Even though very robust and precise methods have been developed to help achieving a reliable segmentation (level sets, active contours, etc), it remains very time consuming both in terms of manual interactions and in terms of computation time. The goal of this study is to present a fast method to find coarse anatomical structures in CTA with few parameters, based on hierarchical clustering. The algorithm is organized as follows: first, a fast non-parametric histogram clustering method is proposed to compute a piecewise constant mask. A second step then indexes all the space-connected regions in the piecewise constant mask. Finally, a hierarchical clustering is achieved to build a graph representing the connections between the various regions in the piecewise constant mask. This step builds up a structural knowledge about the image. Several interactive features for segmentation are presented, for instance association or disassociation of anatomical structures. A comparison with the Mean-Shift algorithm is presented.
A hierarchical approach to reducing communication in parallel graph algorithms
Harshvardhan,
2015-01-01
Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5 × - 8× improvement for a variety of graph algorithms at 12, 000+ cores.
Novel Hierarchical Fall Detection Algorithm Using a Multiphase Fall Model
Hsieh, Chia-Yeh; Liu, Kai-Chun; Huang, Chih-Ning; Chu, Woei-Chyn; Chan, Chia-Tai
2017-01-01
Falls are the primary cause of accidents for the elderly in the living environment. Reducing hazards in the living environment and performing exercises for training balance and muscles are the common strategies for fall prevention. However, falls cannot be avoided completely; fall detection provides an alarm that can decrease injuries or death caused by the lack of rescue. The automatic fall detection system has opportunities to provide real-time emergency alarms for improving the safety and quality of home healthcare services. Two common technical challenges are also tackled in order to provide a reliable fall detection algorithm, including variability and ambiguity. We propose a novel hierarchical fall detection algorithm involving threshold-based and knowledge-based approaches to detect a fall event. The threshold-based approach efficiently supports the detection and identification of fall events from continuous sensor data. A multiphase fall model is utilized, including free fall, impact, and rest phases for the knowledge-based approach, which identifies fall events and has the potential to deal with the aforementioned technical challenges of a fall detection system. Seven kinds of falls and seven types of daily activities arranged in an experiment are used to explore the performance of the proposed fall detection algorithm. The overall performances of the sensitivity, specificity, precision, and accuracy using a knowledge-based algorithm are 99.79%, 98.74%, 99.05% and 99.33%, respectively. The results show that the proposed novel hierarchical fall detection algorithm can cope with the variability and ambiguity of the technical challenges and fulfill the reliability, adaptability, and flexibility requirements of an automatic fall detection system with respect to the individual differences. PMID:28208694
Kinematic gait patterns in healthy runners: A hierarchical cluster analysis.
Phinyomark, Angkoon; Osis, Sean; Hettinga, Blayne A; Ferber, Reed
2015-11-01
Previous studies have demonstrated distinct clusters of gait patterns in both healthy and pathological groups, suggesting that different movement strategies may be represented. However, these studies have used discrete time point variables and usually focused on only one specific joint and plane of motion. Therefore, the first purpose of this study was to determine if running gait patterns for healthy subjects could be classified into homogeneous subgroups using three-dimensional kinematic data from the ankle, knee, and hip joints. The second purpose was to identify differences in joint kinematics between these groups. The third purpose was to investigate the practical implications of clustering healthy subjects by comparing these kinematics with runners experiencing patellofemoral pain (PFP). A principal component analysis (PCA) was used to reduce the dimensionality of the entire gait waveform data and then a hierarchical cluster analysis (HCA) determined group sets of similar gait patterns and homogeneous clusters. The results show two distinct running gait patterns were found with the main between-group differences occurring in frontal and sagittal plane knee angles (Pgait strategies. These results suggest care must be taken when selecting samples of subjects in order to investigate the pathomechanics of injured runners.
The Hierarchical Clustering of Tax Burden in the EU27
Directory of Open Access Journals (Sweden)
Simkova Nikola
2015-09-01
Full Text Available The issue of taxation has become more important due to a significant share of the government revenue. There are several ways of expressing the tax burden of countries. This paper describes the traditional approach as a share of tax revenue to GDP which is applied to the total taxation and the capital taxation as a part of tax systems affecting investment decisions. The implicit tax rate on capital created by Eurostat also offers a possible explanation of the tax burden on capital, so its components are analysed in detail. This study uses one of the econometric methods called the hierarchical clustering. The data on which the clustering is based comprises countries in the EU27 for the period of 1995 – 2012. The aim of this paper is to reveal clusters of countries in the EU27 with similar tax burden or tax changes. The findings suggest that mainly newly acceding countries (2004 and 2007 are in a group of countries with a low tax burden which tried to encourage investors by favourable tax rates. On the other hand, there are mostly countries from the original EU15. Some clusters may be explained by similar historical development, geographic and demographic characteristics.
Clinical fracture risk evaluated by hierarchical agglomerative clustering
DEFF Research Database (Denmark)
Kruse, Christian; Eiken, P; Vestergaard, P
2017-01-01
profiles. INTRODUCTION: The purposes of this study were to establish and quantify patient clusters of high, average and low fracture risk using an unsupervised machine learning algorithm. METHODS: Regional and national Danish patient data on dual-energy X-ray absorptiometry (DXA) scans, medication...... containing less than 250 subjects. Clusters were identified as high, average or low fracture risk based on bone mineral density (BMD) characteristics. Cluster-based descriptive statistics and relative Z-scores for variable means were computed. RESULTS: Ten thousand seven hundred seventy-five women were...... as low fracture risk with high to very high BMD. A mean age of 60 years was the earliest that allowed for separation of high-risk clusters. DXA scan results could identify high-risk subjects with different antiresorptive treatment compliance levels based on similarities and differences in lumbar spine...
Pagnuco, Inti A.; Pastore, Juan I.; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L.
2016-04-01
It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, where significative groups of genes are defined based on some criteria. This task is usually performed by clustering algorithms, where the whole family of genes, or a subset of them, are clustered into meaningful groups based on their expression values in a set of experiment. In this work we used a methodology based on the Silhouette index as a measure of cluster quality for individual gene groups, and a combination of several variants of hierarchical clustering to generate the candidate groups, to obtain sets of co-expressed genes for two real data examples. We analyzed the quality of the best ranked groups, obtained by the algorithm, using an online bioinformatics tool that provides network information for the selected genes. Moreover, to verify the performance of the algorithm, considering the fact that it doesn’t find all possible subsets, we compared its results against a full search, to determine the amount of good co-regulated sets not detected.
Self-organization and clustering algorithms
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Non-convex polygons clustering algorithm
Directory of Open Access Journals (Sweden)
Kruglikov Alexey
2016-01-01
Full Text Available A clustering algorithm is proposed, to be used as a preliminary step in motion planning. It is tightly coupled to the applied problem statement, i.e. uses parameters meaningful only with respect to it. Use of geometrical properties for polygons clustering allows for a better calculation time as opposed to general-purpose algorithms. A special form of map optimized for quick motion planning is constructed as a result.
The Georgi Algorithms of Jet Clustering
Ge, Shao-Feng
2014-01-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$ in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is p...
Hierarchical star cluster assembly in globally collapsing molecular clouds
Vázquez-Semadeni, Enrique; González-Samaniego, Alejandro; Colín, Pedro
2017-05-01
We discuss the mechanism of cluster formation in a numerical simulation of a molecular cloud (MC) undergoing global hierarchical collapse, focusing on how the gas motions in the parent cloud control the assembly of the cluster. The global collapse implies that the star formation rate (SFR) increases over time. The collapse is hierarchical because it consists of small-scale collapses within larger scale ones. The latter culminate a few Myr later than the first small-scale ones and consist of filamentary flows that accrete on to massive central clumps. The small-scale collapses consist of clumps that are embedded in the filaments and falling on to the large-scale collapse centres. The stars formed in the early, small-scale collapses share the infall motion of their parent clumps, so that the filaments feed both gas and stars to the massive central clump. This process leads to the presence of a few older stars in a region where new protostars are forming, and also to a self-similar structure, in which each unit is composed of smaller scale subunits that approach each other and may merge. Because the older stars formed in the filaments share the infall motion of the gas on to the central clump, they tend to have larger velocities and to be distributed over larger areas than the younger stars formed in the central clump. Finally, interpreting the initial mass function (IMF) simply as a probability distribution implies that massive stars only form once the local SFR is large enough to sample the IMF up to high masses. In combination with the increase of the SFR, this implies that massive stars tend to appear late in the evolution of the MC, and only in the central massive clumps. We discuss the correspondence of these features with observed properties of young stellar clusters, finding very good qualitative agreement.
Energy Constrained Hierarchical Task Scheduling Algorithm for Mobile Grids
Directory of Open Access Journals (Sweden)
Arjun Singh
2014-05-01
Full Text Available In mobile grids, scheduling the computation tasks and the communication transactions onto the target architecture is the important problem when a mobile grid environment and a pre-selected architecture are given. Even though the scheduling problem is a traditional topic, almost all previous work focuses on maximizing the performance through the scheduling process. The algorithms developed this way are not suitable for real-time embedded applications, in which the main objective is to minimize the energy consumption of the system under tight performance constraints. This paper entails an energy constrained hierarchical task scheduling algorithm for Mobile Grids to minimize the power consumption of the mobile nodes. The task is rescheduled when the mobile node moves beyond the transmission range. The performance is estimated based on the average delay and packet delivery ratio based on nodes and flows. The performance metrics are analysed using NS-2 simulator.
A hierarchical algorithm for molecular similarity (H-FORMS).
Ramirez-Manzanares, Alonso; Peña, Joaquin; Azpiroz, Jon M; Merino, Gabriel
2015-07-15
A new hierarchical method to determine molecular similarity is introduced. The goal of this method is to detect if a pair of molecules has the same structure by estimating a rigid transformation that aligns the molecules and a correspondence function that matches their atoms. The algorithm firstly detect similarity based on the global spatial structure. If this analysis is not sufficient, the algorithm computes novel local structural rotation-invariant descriptors for the atom neighborhood and uses this information to match atoms. Two strategies (deterministic and stochastic) on the matching based alignment computation are tested. As a result, the atom-matching based on local similarity indexes decreases the number of testing trials and significantly reduces the dimensionality of the Hungarian assignation problem. The experiments on well-known datasets show that our proposal outperforms state-of-the-art methods in terms of the required computational time and accuracy. © 2015 Wiley Periodicals, Inc.
Comparison and evaluation of network clustering algorithms applied to genetic interaction networks.
Hou, Lin; Wang, Lin; Berg, Arthur; Qian, Minping; Zhu, Yunping; Li, Fangting; Deng, Minghua
2012-01-01
The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes; Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets; the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks.
Franke, R.
2016-11-01
In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.
Combined Density-based and Constraint-based Algorithm for Clustering
Institute of Scientific and Technical Information of China (English)
CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing
2006-01-01
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.
Optimal Hops-Based Adaptive Clustering Algorithm
Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong
This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.
Issues Challenges and Tools of Clustering Algorithms
Directory of Open Access Journals (Sweden)
Parul Agarwal
2011-05-01
Full Text Available Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real when clustering algorithms are implemented .It also considers the most extensively used tools which are readily available and support functions which ease the programming. Once algorithms have been implemented, they also need to be tested for its validity. There exist several validation indexes for testing the performance and accuracy which have also been discussed here.
Lyman Alpha Emitters in the Hierarchically Clustering Galaxy Formation
Kobayashi, Masakazu A R; Nagashima, Masahiro
2007-01-01
We present a new theoretical model for the luminosity functions (LFs) of Lyman alpha (Lya) emitting galaxies in the framework of hierarchical galaxy formation. We extend a semi-analytic model of galaxy formation that reproduces a number of observations for local galaxies, without changing the original model parameters but introducing a physically-motivated modelling to describe the escape fraction of Lya photons from host galaxies (f_esc). Though a previous study using a hierarchical clustering model simply assumed a constant and universal value of f_esc, we incorporate two new effects on f_esc: extinction by interstellar dust and galaxy-scale outflow induced as a star formation feedback. It is found that the new model nicely reproduces all the observed Lya LFs of the Lya emitters (LAEs) at different redshifts in z ~ 3--6. Our model predicts that galaxies with strong outflows and f_esc ~ 1 are dominant in the observed LFs, which is consistent with available observations while the simple universal f_esc model ...
The structure of dark matter halos in hierarchical clustering theories
Subramanian, K; Ostriker, J P; Subramanian, Kandaswamy; Cen, Renyue; Ostriker, Jeremiah P.
1999-01-01
During hierarchical clustering, smaller masses generally collapse earlier than larger masses and so are denser on the average. The core of a small mass halo could be dense enough to resist disruption and survive undigested, when it is incorporated into a bigger object. We explore the possibility that a nested sequence of undigested cores in the center of the halo, which have survived the hierarchical, inhomogeneous collapse to form larger and larger objects, determines the halo structure in the inner regions. For a flat universe with $P(k) \\propto k^n$, scaling arguments then suggest that the core density profile is, $\\rho \\propto r^{-\\alpha}$ with $\\alpha = (9+3n)/(5+n)$. But whether such behaviour obtains depends on detailed dynamics. We first examine the dynamics using a fluid approach to the self-similar collapse solutions for the dark matter phase space density, including the effect of velocity dispersions. We highlight the importance of tangential velocity dispersions to obtain density profiles shallowe...
Hierarchical Compressed Sensing for Cluster Based Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Vishal Krishna Singh
2016-02-01
Full Text Available Data transmission consumes significant amount of energy in large scale wireless sensor networks (WSNs. In such an environment, reducing the in-network communication and distributing the load evenly over the network can reduce the overall energy consumption and maximize the network lifetime significantly. In this work, the aforementioned problem of network lifetime and uneven energy consumption in large scale wireless sensor networks is addressed. This work proposes a hierarchical compressed sensing (HCS scheme to reduce the in-network communication during the data gathering process. Co-related sensor readings are collected via a hierarchical clustering scheme. A compressed sensing (CS based data processing scheme is devised to transmit the data from the source to the sink. The proposed HCS is able to identify the optimal position for the application of CS to achieve reduced and similar number of transmissions on all the nodes in the network. An activity map is generated to validate the reduced and uniformly distributed communication load of the WSN. Based on the number of transmissions per data gathering round, the bit-hop metric model is used to analyse the overall energy consumption. Simulation results validate the efficiency of the proposed method over the existing CS based approaches.
Blockspin Cluster Algorithms for Quantum Spin Systems
Wiese, U J
1992-01-01
Cluster algorithms are developed for simulating quantum spin systems like the one- and two-dimensional Heisenberg ferro- and anti-ferromagnets. The corresponding two- and three-dimensional classical spin models with four-spin couplings are maped to blockspin models with two-blockspin interactions. Clusters of blockspins are updated collectively. The efficiency of the method is investigated in detail for one-dimensional spin chains. Then in most cases the new algorithms solve the problems of slowing down from which standard algorithms are suffering.
Identifying Reference Objects by Hierarchical Clustering in Java Environment
Directory of Open Access Journals (Sweden)
RAHUL SAHA
2011-09-01
Full Text Available Recently Java programming environment has become so popular. Java programming language is a language that is designed to be portable enough to be executed in wide range of computers ranging from cell phones to supercomputers. Computer programs written in Java are compiled into Java Byte code instructions that are suitable for execution by a Java Virtual Machine implementation. Java virtual Machine is commonly implemented in software by means of an interpreter for the Java Virtual Machine instruction set. As an object oriented language, Java utilizes the concept of objects. Our idea is to identify the candidate objects references in a Java environment through hierarchical cluster analysis using reference stack and execution stack.
A New Clustering Algorithm for Face Classification
Directory of Open Access Journals (Sweden)
Shaker K. Ali
2016-06-01
Full Text Available In This paper, we proposed new clustering algorithm depend on other clustering algorithm ideas. The proposed algorithm idea is based on getting distance matrix, then the exclusion of the matrix points which will be clustered by saving the location (row, column of these points and determine the minimum distance of these points which will be belongs the group (class and keep the other points which are not clustering yet. The propose algorithm is applied to image data base of the human face with different environment (direction, angles... etc.. These data are collected from different resource (ORL site and real images collected from random sample of Thi_Qar city population in lraq. Our algorithm has been implemented on three types of distance to calculate the minimum distance between points (Euclidean, Correlation and Minkowski distance .The efficiency ratio of proposed algorithm has varied according to the data base and threshold, the efficiency of our algorithm is exceeded (96%. Matlab (2014 has been used in this work.
The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS
Zhou, Q.; Leng, F.; Leydesdorff, L.
2015-01-01
Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare the
The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS
Zhou, Q.; Leng, F.; Leydesdorff, L.
2015-01-01
Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare
Cluster hybrid Monte Carlo simulation algorithms
Plascak, J. A.; Ferrenberg, Alan M.; Landau, D. P.
2002-06-01
We show that addition of Metropolis single spin flips to the Wolff cluster-flipping Monte Carlo procedure leads to a dramatic increase in performance for the spin-1/2 Ising model. We also show that adding Wolff cluster flipping to the Metropolis or heat bath algorithms in systems where just cluster flipping is not immediately obvious (such as the spin-3/2 Ising model) can substantially reduce the statistical errors of the simulations. A further advantage of these methods is that systematic errors introduced by the use of imperfect random-number generation may be largely healed by hybridizing single spin flips with cluster flipping.
MST-BASED CLUSTERING TOPOLOGY CONTROL ALGORITHM FOR WIRELESS SENSOR NETWORKS
Institute of Scientific and Technical Information of China (English)
Cai Wenyu; Zhang Meiyan
2010-01-01
In this paper,we propose a novel clustering topology control algorithm named Minimum Spanning Tree (MST)-based Clustering Topology Control (MCTC) for Wireless Sensor Networks (WSNs),which uses a hybrid approach to adjust sensor nodes' transmission power in two-tiered hierarchical WSNs. MCTC algorithm employs a one-hop Maximum Energy & Minimum Distance (MEMD) clustering algorithm to decide clustering status. Each cluster exchanges information between its own Cluster Members (CMs) locally and then deliveries information to the Cluster Head (CH). Moreover,CHs exchange information between CH and CH and afterwards transmits aggregated information to the base station finally. The intra-cluster topology control scheme uses MST to decide CMs' transmission radius,similarly,the inter-cluster topology control scheme applies MST to decide CHs' transmission radius. Since the intra-cluster topology control is a full distributed approach and the inter-cluster topology control is a pure centralized approach performed by the base station,therefore,MCTC algorithm belongs to one kind of hybrid clustering topology control algorithms and can obtain scalability topology and strong connectivity guarantees simultaneously. As a result,the network topology will be reduced by MCTC algorithm so that network energy efficiency will be improved. The simulation results verify that MCTC outperforms traditional topology control schemes such as LMST,DRNG and MEMD at the aspects of average node's degree,average node's power radius and network lifetime,respectively.
Constructing a graph of connections in clustering algorithm of complex objects
Directory of Open Access Journals (Sweden)
Татьяна Шатовская
2015-05-01
Full Text Available The article describes the results of modifying the algorithm Chameleon. Hierarchical multi-level algorithm consists of several phases: the construction of the count, coarsening, the separation and recovery. Each phase can be used various approaches and algorithms. The main aim of the work is to study the quality of the clustering of different sets of data using a set of algorithms combinations at different stages of the algorithm and improve the stage of construction by the optimization algorithm of k choice in the graph construction of k of nearest neighbors
A Cluster Maintenance Algorithm Based on Relative Mobility for Mobile Ad Hoc Network Management
Institute of Scientific and Technical Information of China (English)
SHENZhong; CHANGYilin; ZHANGXin
2005-01-01
The dynamic topology of mobile ad hoc networks makes network management significantly more challenging than wireline networks. The traditional Client/Server (Manager/Agent) management paradigm could not work well in such a dynamic environment, while the hierarchical network management architecture based on clustering is more feasible. Although the movement of nodes makes the cluster structure changeable and introduces new challenges for network management, the mobility is a relative concept. A node with high relative mobility is more prone to unstable behavior than a node with less relative mobility, thus the relative mobility of a node can be used to predict future node behavior. This paper presents the cluster availability which provides a quantitative measurement of cluster stability. Furthermore, a cluster maintenance algorithm based on cluster availability is proposed. The simulation results show that, compared to the Minimum ID clustering algorithm, our algorithm successfully alleviates the influence caused by node mobility and make the network management more efficient.
Using PSO-Based Hierarchical Feature Selection Algorithm
Directory of Open Access Journals (Sweden)
Zhiwei Ji
2014-01-01
Full Text Available Hepatocellular carcinoma (HCC is one of the most common malignant tumors. Clinical symptoms attributable to HCC are usually absent, thus often miss the best therapeutic opportunities. Traditional Chinese Medicine (TCM plays an active role in diagnosis and treatment of HCC. In this paper, we proposed a particle swarm optimization-based hierarchical feature selection (PSOHFS model to infer potential syndromes for diagnosis of HCC. Firstly, the hierarchical feature representation is developed by a three-layer tree. The clinical symptoms and positive score of patient are leaf nodes and root in the tree, respectively, while each syndrome feature on the middle layer is extracted from a group of symptoms. Secondly, an improved PSO-based algorithm is applied in a new reduced feature space to search an optimal syndrome subset. Based on the result of feature selection, the causal relationships of symptoms and syndromes are inferred via Bayesian networks. In our experiment, 147 symptoms were aggregated into 27 groups and 27 syndrome features were extracted. The proposed approach discovered 24 syndromes which obviously improved the diagnosis accuracy. Finally, the Bayesian approach was applied to represent the causal relationships both at symptom and syndrome levels. The results show that our computational model can facilitate the clinical diagnosis of HCC.
Hierarchical Artificial Bee Colony Algorithm for RFID Network Planning Optimization
Directory of Open Access Journals (Sweden)
Lianbo Ma
2014-01-01
Full Text Available This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness.
Polyclonal clustering algorithm and its convergence
Institute of Scientific and Technical Information of China (English)
MA Li; JIAO Li-cheng; BAI Lin; CHEN Chang-guo
2008-01-01
Being characteristic of non-teacher learning, self-organization, memory, and noise resistance, the artificial immune system is a research focus in the field of intelligent information processing. Based on the basic principles of organism immune and clonal selection, this article presents a polyclonal clustering algorithm characteristic of self-adaptation. According to the core idea of the algorithm, various immune operators in the artificial immune system are employed in the clustering process; moreover, clustering numbers are adjusted in accordance with the affinity function. Introduction of the recombination operator can effectively enhance the diversity of the individual antibody in a generation population, so that the searching scope for solutions is enlarged and the premature phenomenon of the algorithm is avoided. Besides, introduction of the inconsistent mutation operator enhances the adaptability and optimizes the performance of local solution seeking. Meanwhile, the convergence of the algorithm is accelerated. In addition, the article also proves the convergence of the algorithm by employing the Markov chain. Results of the data simulation experiment show that the algorithm is capable of obtaining reasonable and effective cluster.
Hierarchical Stochastic Simulation Algorithm for SBML Models of Genetic Circuits
Directory of Open Access Journals (Sweden)
Leandro eWatanabe
2014-11-01
Full Text Available This paper describes a hierarchical stochastic simulation algorithm which has been implemented within iBioSim, a tool used to model, analyze, and visualize genetic circuits. Many biological analysis tools flatten out hierarchy before simulation, but there are many disadvantages associated with this approach. First, the memory required to represent the model can quickly expand in the process. Second, the flattening process is computationally expensive. Finally, when modeling a dynamic cellular population within iBioSim, inlining the hierarchy of the model is inefficient since models must grow dynamically over time. This paper discusses a new approach to handle hierarchy on the fly to make the tool faster and more memory-efficient. This approach yields significant performance improvements as compared to the former flat analysis method.
Hierarchical satisfying optimal algorithm with different importance and priorities
Institute of Scientific and Technical Information of China (English)
Li Shaoyuan; Teng Changjun
2005-01-01
A hierarchical satisfying optimal algorithm incorporating different importance and preemptive priorities is formulated. With the priority structure given by the decision-maker in the constrained multi-objective multi-degree-of-freedom optimization (CMMO) problem, the commonly used quadratic programming model is converted into a two-level optimization problem solved by the tolerant lexicographic method and the varying-domain optimization method. In contrast to previous works, the proposed approach allows the decision-maker to determine a desirable achievement degree for each goal to reflect explicitly the relative importance of these goals. The resulting solutions satisfy both the preemptive priority structure and have the maximum achievement degrees in sum. The power of the proposed approach is demonstrated with an example.
Scalable Hierarchical Algorithms for stochastic PDEs and UQ
Litvinenko, Alexander
2015-01-07
H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by Kriemann [1,2]. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].
Scalable Hierarchical Algorithms for stochastic PDEs and Uncertainty Quantification
Litvinenko, Alexander
2015-01-05
H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by R. Kriemann, 2005. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.
Directory of Open Access Journals (Sweden)
Mamta Malik
2011-09-01
Full Text Available Cluster detection is a tool employed by GIS scientists who specialize in the field of spatial analysis. This study employed a combination of GIS, RS and a novel 3DCCOM spatial data clustering algorithm to assess the rural demographic development strategies of Sonepat block, Haryana, India. This Study is undertaken in the rural and rural-based district in India to demonstrate the integration of village-level spatial and non-spatial data in GIS environment using Hierarchical Clustering. Spatial clusters of living standard parameters, including family members, male and female population, sex ratio, total male and female education ratio etc. The paper also envisages future development and usefulness of this community GIS, Spatial data clustering tool for grass-root level planning. Any data that showsgeographic (spatial variability can be subject to cluster analysis.
Odong, T.L.; Heerwaarden, van J.; Jansen, J.; Hintum, van T.J.L.; Eeuwijk, van F.A.
2011-01-01
Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using
An Adaptive Clustering Algorithm for Intrusion Detection
Institute of Scientific and Technical Information of China (English)
QIU Juli
2007-01-01
In this paper,we introduce an adaptive clustering algorithm for intrusion detection based on wavecluster which was introduced by Gholamhosein in 1999 and used with success in image processing.Because of the non-stationary characteristic of network traffic,we extend and develop an adaptive wavecluster algorithm for intrusion detection.Using the multiresolution property of wavelet transforms,we can effectively identify arbitrarily shaped clusters at different scales and degrees of detail,moreover,applying wavelet transform removes the noise from the original feature space and make more accurate cluster found.Experimental results on KDD-99 intrusion detection dataset show the efficiency and accuracy of this algorithm.A detection rate above 96% and a false alarm rate below 3% are achieved.
Efficient Cluster Head Selection Algorithm for MANET
Directory of Open Access Journals (Sweden)
Khalid Hussain
2013-01-01
Full Text Available In mobile ad hoc network (MANET cluster head selection is considered a gigantic challenge. In wireless sensor network LEACH protocol can be used to select cluster head on the bases of energy, but it is still a dispute in mobil ad hoc networks and especially when nodes are itinerant. In this paper we proposed an efficient cluster head selection algorithm (ECHSA, for selection of the cluster head efficiently in Mobile ad hoc networks. We evaluate our proposed algorithm through simulation in OMNet++ as well as on test bed; we experience the result according to our assumption. For further evaluation we also compare our proposed protocol with several other protocols like LEACH-C and consequences show perfection.
Directory of Open Access Journals (Sweden)
Susan Worner
2013-09-01
Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
Directory of Open Access Journals (Sweden)
Morris John H
2011-11-01
Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster
Parallel Clustering Algorithms for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Analysis of Stemming Algorithm for Text Clustering
Directory of Open Access Journals (Sweden)
N.Sandhya
2011-09-01
Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.
High-Performance Broadcasting Algorithms on Cluster
Institute of Scientific and Technical Information of China (English)
舒继武; 魏英霞; 王鼎兴
2004-01-01
In many clusters connected by high-speed communication networks, the exact structure of the underlying communication network and the latency difference between different sending and receiving pairs may be ignored when they broadcast, such as in the approach adopted by the broadcasting method in MPICH,a widely used MPI implementation. However, the underlying network cluster topologies are becoming more and more complicated and the performance of traditional broadcasting algorithms, such as MPICH's MPI_Bcast, is far from good. This paper analyzed the impact of communication latencies and the underlying topologies on the performance of broadcasting algorithms for multilevel clusters. A multilevel model was developed for broadcasting in clusters with complicated topologies, which divides the cluster topology into many levels based on the underlying topology. The multilevel model was used to develop a new broadcast algorithm,MLM broadcast-2 (MLMB-2), that adapts to a wide range of clusters. Comparison of the performance of the counterpart MPI operation MPI_Bcast and MLMB-2 shows that MLMB-2 outperforms MPl_Bcast by decreasing the broadcast running time by 60%-90%.
Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.
We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
Cluster algorithm special purpose processor
Energy Technology Data Exchange (ETDEWEB)
Talapov, A.L.; Shchur, L.N.; Andreichenko, V.B.; Dotsenko, V.S. (Landau Inst. for Theoretical Physics, GSP-1 117940 Moscow V-334 (USSR))
1992-08-10
In this paper, the authors describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
Craig, Paul; Roa-Seïler, Néna
2013-01-01
This paper describes a novel information visualization technique that combines multidimensional scaling and hierarchical clustering to support the exploratory analysis of multidimensional data. The technique displays the results of multidimensional scaling using a scatter plot where the proximity of any two items' representations is approximate to their similarity according to a Euclidean distance metric. The results of hierarchical clustering are overlaid onto this view by drawing smoothed outlines around each nested cluster. The difference in similarity between successive cluster combinations is used to colour code clusters and make stronger natural clusters more prominent in the display. When a cluster or group of items is selected, multidimensional scaling and hierarchical clustering are re-applied to a filtered subset of the data, and animation is used to smooth the transition between successive filtered views. As a case study we demonstrate the technique being used to analyse survey data relating to the appropriateness of different phrases to different emotionally charged situations.
An Improved Heuristic Ant-Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
Yunfei Chen; Yushu Liu; Jihai Zhao
2004-01-01
An improved heuristic ant-clustering algorithm(HAC)is presented in this paper. A device of ＇memory bank＇ is proposed,which can bring forth heuristic knowledge guiding ant to move in the bi-dimension grid space.The device experiments on real data sets and synthetic data sets.The results demonstrate that HAC has superiority in misclassification error rate and runtime over the classical algorithm.
Directory of Open Access Journals (Sweden)
Татьяна Борисовна Шатовская
2015-03-01
Full Text Available In this work results of modified Chameleon algorithm are discussed. Hierarchical multilevel algorithms consist of several stages: building the graph, coarsening, partitioning, recovering. Exploring of clustering quality for different data sets with different combinations of algorithms on different stages of the algorithm is the main aim of the article. And also aim is improving the construction phase through the optimization algorithm of choice k in the building the graph k-nearest neighbors
Genetic algorithm applied to hierarchically coupled associative memories.
Gomes, Rogério Martins; Braga, Antônio Pádua; Borges, Henrique E
2010-01-01
Inspired by the theory of neuronal group selection (TNGS), we have carried out an analysis of the capacity of convergence of a multi-level associative memory based on coupled generalized-brain-state-in-a-box (GBSB) networks through evolutionary computation. The TNGS establishes that a memory process can be described as being organized functionally in hierarchical levels where higher levels coordinate sets of functions of lower levels. According to this theory, the most basic units in the cortical area of the brain are called neuronal groups or first-level blocks of memories and the higher-level memories are formed through selective strengthening or weakening of the synapses amongst the neuronal groups. In order to analyse this effect, we propose that the higher levels should emerge through a learning mechanism as correlations of lower level memories. According to this proposal, this paper describes a method of acquiring the inter-group synapses based on a genetic algorithm. Thus the results show that genetic algorithms are feasible as they allow the emergence of complex behaviours which could be potentially excluded in other learning process.
Directory of Open Access Journals (Sweden)
M. Safish Mary
2012-04-01
Full Text Available Classification of large amount of data is a time consuming process but crucial for analysis and decision making. Radial Basis Function networks are widely used for classification and regression analysis. In this paper, we have studied the performance of RBF neural networks to classify the sales of cars based on the demand, using kernel density estimation algorithm which produces classification accuracy comparable to data classification accuracy provided by support vector machines. In this paper, we have proposed a new instance based data selection method where redundant instances are removed with help of a threshold thus improving the time complexity with improved classification accuracy. The instance based selection of the data set will help reduce the number of clusters formed thereby reduces the number of centers considered for building the RBF network. Further the efficiency of the training is improved by applying a hierarchical clustering technique to reduce the number of clusters formed at every step. The paper explains the algorithm used for classification and for conditioning the data. It also explains the complexities involved in classification of sales data for analysis and decision-making.
Munandar, T. A.; Azhari; Mushdholifah, A.; Arsyad, L.
2017-03-01
Disparities in regional development methods are commonly identified using the Klassen Typology and Location Quotient. Both methods typically use the data on the gross regional domestic product (GRDP) sectors of a particular region. The Klassen approach can identify regional disparities by classifying the GRDP sector data into four classes, namely Quadrants I, II, III, and IV. Each quadrant indicates a certain level of regional disparities based on the GRDP sector value of the said region. Meanwhile, the Location Quotient (LQ) is usually used to identify potential sectors in a particular region so as to determine which sectors are potential and which ones are not potential. LQ classifies each sector into three classes namely, the basic sector, the non-basic sector with a competitive advantage, and the non-basic sector which can only meet its own necessities. Both Klassen Typology and LQ are unable to visualize the relationship of achievements in the development clearly of each region and sector. This research aimed to develop a new approach to the identification of disparities in regional development in the form of hierarchical clustering. The method of Hierarchical Agglomerative Clustering (HAC) was employed as the basis of the hierarchical clustering model for identifying disparities in regional development. Modifications were made to HAC using the Klassen Typology and LQ. Then, HAC which had been modified using the Klassen Typology was called MHACK while HAC which had been modified using LQ was called MACLoQ. Both algorithms can be used to identify regional disparities (MHACK) and potential sectors (MACLoQ), respectively, in the form of hierarchical clusters. Based on the MHACK in 31 regencies in Central Java Province, it is identified that 3 regencies (Demak, Jepara, and Magelang City) fall into the category of developed and rapidly-growing regions, while the other 28 regencies fall into the category of developed but depressed regions. Results of the MACLo
Hierarchical clusters in families with type 2 diabetes
García-Solano, Beatriz; Gallegos-Cabriales, Esther C; Gómez-Meza, Marco V; García-Madrid, Guillermina; Flores-Merlo, Marcela; García-Solano, Mauro
2015-01-01
Families represent more than a set of individuals; family is more than a sum of its individual members. With this classification, nurses can identify the family health-illness beliefs obey family as a unit concept, and plan family inclusion into the type 2 diabetes treatment, whom is not considered in public policy, despite families share diet, exercise, and self-monitoring with a member who suffers type 2 diabetes. The aim of this study was to determine whether the characteristics, functionality, routines, and family and individual health in type 2 diabetes describes the differences and similarities between families to consider them as a unit. We performed an exploratory, descriptive hierarchical cluster analysis of 61 families using three instruments and a questionnaire, in addition to weight, height, body fat percentage, hemoglobin A1c, total cholesterol, triglycerides, low-density lipoprotein and high-density lipoprotein. The analysis produced three groups of families. Wilk’s lambda demonstrated statistically significant differences provided by age (Λ = 0.778, F = 2.098, p = 0.010) and family health (Λ = 0.813, F = 2.650, p = 0.023). A post hoc Tukey test coincided with the three subsets. Families with type 2 diabetes have common elements that make them similar, while sharing differences that make them unique. PMID:27347419
Banerjee, Sambaran
2014-01-01
The formation of very young massive clusters or "starburst" clusters is currently one of the most widely debated topic in astronomy. The classical notion dictates that a star cluster is formed in-situ in a dense molecular gas clump followed by a substantial residual gas expulsion. On the other hand, based on the observed morphologies of many young stellar associations, a hierarchical formation scenario is alternatively suggested. A very young (age $\\approx$ 1 Myr), massive ($>10^4M_\\odot$) star cluster like the Galactic NGC 3603 young cluster (HD 97950) is an appropriate testbed for distinguishing between such "monolithic" and "hierarchical" formation scenarios. A recent study by Banerjee and Kroupa (2014) demonstrates that the monolithic scenario remarkably reproduces the HD 97950 cluster. In the present work, we explore the possibility of the formation of the above cluster via hierarchical assembly of subclusters. These subclusters are initially distributed over a wide range of spatial volumes and have vari...
A Fast Algorithm for Support Vector Clustering
Institute of Scientific and Technical Information of China (English)
吕常魁; 姜澄宇; 王宁生
2004-01-01
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model[3] , the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed.The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
Fuzzy Rules for Ant Based Clustering Algorithm
Directory of Open Access Journals (Sweden)
Amira Hamdi
2016-01-01
Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.
Application of a New Fuzzy Clustering Algorithm in Intrusion Detection
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
This paper presents a new Section Set Adaptive FCM algorithm. The algorithm solved the shortcomings of localoptimality, unsure classification and clustering numbers ascertained previously. And it improved on the architecture of FCM al-gorithm, enhanced the analysis for effective clustering. During the clustering processing, it may adjust clustering numbers dy-namically. Finally, it used the method of section set decreasing the time of classification. By experiments, the algorithm can im-prove dependability of clustering and correctness of classification.
Noor Rashidah Rashid
2012-01-01
Cluster Analysis is a multivariate method in statistics. Agglomerative Hierarchical Cluster Analysis is one of approaches in Cluster Analysis. There are two linkage methods in Agglomerative Hierarchical Cluster Analysis which are Single Linkage and Complete Linkage. The purpose of this study is to compare between Single Linkage and Complete Linkage in Agglomerative Hierarchical Cluster Analysis. The comparison of performances between these linkage methods was shown by using Kruskal-Wallis tes...
THE EVOLUTION OF BRIGHTEST CLUSTER GALAXIES IN A HIERARCHICAL UNIVERSE
Energy Technology Data Exchange (ETDEWEB)
Tonini, Chiara; Bernyk, Maksym; Croton, Darren [Centre for Astrophysics and Supercomputing, Swinburne University of Technology, Melbourne, VIC 3122 (Australia); Maraston, Claudia; Thomas, Daniel [Institute of Cosmology and Gravitation, University of Portsmouth, Portsmouth PO1 3FX (United Kingdom)
2012-11-01
We investigate the evolution of brightest cluster galaxies (BCGs) from redshift z {approx} 1.6 to z = 0. We upgrade the hierarchical semi-analytic model of Croton et al. with a new spectro-photometric model that produces realistic galaxy spectra, making use of the Maraston stellar populations and a new recipe for the dust extinction. We compare the model predictions of the K-band luminosity evolution and the J - K, V - I, and I - K color evolution with a series of data sets, including those of Collins et al. who argued that semi-analytic models based on the Millennium simulation cannot reproduce the red colors and high luminosity of BCGs at z > 1. We show instead that the model is well in range of the observed luminosity and correctly reproduces the color evolution of BCGs in the whole redshift range up to z {approx} 1.6. We argue that the success of the semi-analytic model is in large part due to the implementation of a more sophisticated spectro-photometric model. An analysis of the model BCGs shows an increase in mass by a factor of 2-3 since z {approx} 1, and star formation activity down to low redshifts. While the consensus regarding BCGs is that they are passively evolving, we argue that this conclusion is affected by the degeneracy between star formation history and stellar population models used in spectral energy distribution fitting, and by the inefficacy of toy models of passive evolution to capture the complexity of real galaxies, especially those with rich merger histories like BCGs. Following this argument, we also show that in the semi-analytic model the BCGs show a realistic mix of stellar populations, and that these stellar populations are mostly old. In addition, the age-redshift relation of the model BCGs follows that of the universe, meaning that given their merger history and star formation history, the ageing of BCGs is always dominated by the ageing of their stellar populations. In a {Lambda}CDM universe, we define such evolution as &apos
Directory of Open Access Journals (Sweden)
Kellermann Walter
2007-01-01
Full Text Available We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the -norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.
D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure
Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.
2016-06-01
Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D) method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.
Parallel FFT Algorithm on Computer Clusters
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
DFT is widely applied in the field of signal process and others. Most present rapid ways of calculation are either based on paralleled computers connected by such particular systems like butterfly network, hypercube etc;or based on the assumption of instant transportation, non-conflict communication, complete connection of paralleled processors and unlimited usable processors. However, the delay of communication in the system of information transmission cannot be ignored. This paper works on the following aspects: instant transmission, dispatching missions, and the path of information through the communication link in the computer cluster systems;layout of the dynamic FFT algorithm under the different structures of computer clusters.
DEFF Research Database (Denmark)
Ackerman, Margareta; Ben-David, Shai; Branzei, Simina
2012-01-01
We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...
Comparative study of several Clustering Algorithms
Directory of Open Access Journals (Sweden)
Prof. Neha Soni, Dr. Amit Ganatra
2012-12-01
Full Text Available Cluster Analysis is a process of grouping theobjects, where objects can be physical like a studentor can be an abstract such as behaviour of acustomer or handwriting of a person. The clusteranalysis is as old as a human life and has its rootsin many fields such as statistics, machine learning,biology, artificial intelligence. It is an unsupervisedlearning and faces many challenges such as a highdimension of the dataset, arbitrary shapes ofclusters, scalability, input parameter, domainknowledge and noisy data. Large number ofclustering algorithms had been proposed till date toaddress these challenges. There do not exist a singlealgorithm which can adequately handle all sorts ofrequirement. This makes a great challenge for theuser to do selection among the available algorithmfor the specific task. The purpose of this paper is toprovide a detailed analytical comparison of some ofthe very well known clustering algorithms, whichprovides guidance for the selection of clusteringalgorithm for a specific application.
An incremental clustering algorithm based on Mahalanobis distance
Aik, Lim Eng; Choon, Tan Wee
2014-12-01
Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.
CABOSFV algorithm for high dimensional sparse data clustering
Institute of Scientific and Technical Information of China (English)
Sen Wu; Xuedong Gao
2004-01-01
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV), was proposed for the high dimensional clustering of binary sparse data. This algorithm compresses the data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scale enormously, and can get the clustering result with only one data scan. Both theoretical analysis and empirical tests showed that CABOSFV is of low computational complexity. The algorithm finds clusters in high dimensional large datasets efficiently and handles noise effectively.
Hierarchical Problem Solving with the Linkage Tree Genetic Algorithm
Thierens, D.; Bosman, P.A.N.; Blum, C.; Alba, E.
2013-01-01
Hierarchical problems represent an important class of nearly decomposable problems. The concept of near decomposability is central to the study of complex systems. When little a priori information is available, a black box problem solver is needed to optimize these hierarchical problems. The solver
First Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.
We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.
Dynamic exponents for potts model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
We have studied the Swendsen-Wang and Wolff cluster update algorithms for the Ising model in 2, 3 and 4 dimensions. The data indicate simple relations between the specific heat and the Wolff autocorrelations, and between the magnetization and the Swendsen-Wang autocorrelations. This implies that the dynamic critical exponents are related to the static exponents of the Ising model. We also investigate the possibility of similar relationships for the Q-state Potts model.
Enhanced Unequal Clustering Algorithm for Wireless Sensor Networks
Talbi, Said; Zaouche, Lotfi
2015-01-01
International audience; Clustering is considered as solution for more energy conservation during communications in wireless sensor networks. Recently, a new clustering algorithm named Unequal Clustering Algorithm (UCA) is proposed to avoid the burdened cluster-heads located around the sink due to the traffic coming from others which are far to the base station. This paper presents an Enhanced Unequal Clustering Algorithm called EUCA. This solution reduces the control traffic during a clusteri...
ITS Cluster Finding Algorithm on GPU
Changaival, Boonyarit
2014-01-01
ITS cluster finding algorithm is one of the data reduction algorithms at ALICE. It needs to be processed fast due to a high amount of data readout from the detector. A variety of platforms were studied for the system design. My work is to design, implement and benchmark this algorithm on a GPU platform. GPU is one of many platform that promote parallel computing. A high-end GPU can contain over 2000 processing cores comparing to the commodity CPUs which have only four cores. The program is written in C and CUDA library. The throughput (Number of events per second) is used as a metric to measure the performance. With the latest implementation, the throughput was increased by a factor of 5.
An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints
Directory of Open Access Journals (Sweden)
Yunqing Rao
2013-01-01
Full Text Available For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.
A test sheet generating algorithm based on intelligent genetic algorithm and hierarchical planning
Gu, Peipei; Niu, Zhendong; Chen, Xuting; Chen, Wei
2013-03-01
In recent years, computer-based testing has become an effective method to evaluate students' overall learning progress so that appropriate guiding strategies can be recommended. Research has been done to develop intelligent test assembling systems which can automatically generate test sheets based on given parameters of test items. A good multisubject test sheet depends on not only the quality of the test items but also the construction of the sheet. Effective and efficient construction of test sheets according to multiple subjects and criteria is a challenging problem. In this paper, a multi-subject test sheet generation problem is formulated and a test sheet generating approach based on intelligent genetic algorithm and hierarchical planning (GAHP) is proposed to tackle this problem. The proposed approach utilizes hierarchical planning to simplify the multi-subject testing problem and adopts genetic algorithm to process the layered criteria, enabling the construction of good test sheets according to multiple test item requirements. Experiments are conducted and the results show that the proposed approach is capable of effectively generating multi-subject test sheets that meet specified requirements and achieve good performance.
Multi-objective hierarchical genetic algorithms for multilevel redundancy allocation optimization
Energy Technology Data Exchange (ETDEWEB)
Kumar, Ranjan [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: ranjan.k@ks3.ecs.kyoto-u.ac.jp; Izui, Kazuhiro [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: izui@prec.kyoto-u.ac.jp; Yoshimura, Masataka [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: yoshimura@prec.kyoto-u.ac.jp; Nishiwaki, Shinji [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: shinji@prec.kyoto-u.ac.jp
2009-04-15
Multilevel redundancy allocation optimization problems (MRAOPs) occur frequently when attempting to maximize the system reliability of a hierarchical system, and almost all complex engineering systems are hierarchical. Despite their practical significance, limited research has been done concerning the solving of simple MRAOPs. These problems are not only NP hard but also involve hierarchical design variables. Genetic algorithms (GAs) have been applied in solving MRAOPs, since they are computationally efficient in solving such problems, unlike exact methods, but their applications has been confined to single-objective formulation of MRAOPs. This paper proposes a multi-objective formulation of MRAOPs and a methodology for solving such problems. In this methodology, a hierarchical GA framework for multi-objective optimization is proposed by introducing hierarchical genotype encoding for design variables. In addition, we implement the proposed approach by integrating the hierarchical genotype encoding scheme with two popular multi-objective genetic algorithms (MOGAs)-the strength Pareto evolutionary genetic algorithm (SPEA2) and the non-dominated sorting genetic algorithm (NSGA-II). In the provided numerical examples, the proposed multi-objective hierarchical approach is applied to solve two hierarchical MRAOPs, a 4- and a 3-level problems. The proposed method is compared with a single-objective optimization method that uses a hierarchical genetic algorithm (HGA), also applied to solve the 3- and 4-level problems. The results show that a multi-objective hierarchical GA (MOHGA) that includes elitism and mechanism for diversity preserving performed better than a single-objective GA that only uses elitism, when solving large-scale MRAOPs. Additionally, the experimental results show that the proposed method with NSGA-II outperformed the proposed method with SPEA2 in finding useful Pareto optimal solution sets.
A Hierarchical Algorithm for Integrated Scheduling and Control With Applications to Power Systems
DEFF Research Database (Denmark)
Sokoler, Leo Emil; Dinesen, Peter Juhler; Jørgensen, John Bagterp
2016-01-01
The contribution of this paper is a hierarchical algorithm for integrated scheduling and control via model predictive control of hybrid systems. The controlled system is a linear system composed of continuous control, state, and output variables. Binary variables occur as scheduling decisions...... portfolio case study show that the hierarchical algorithm reduces the computation to solve the OCP by several orders of magnitude. The improvement in computation time is achieved without a significant increase in the overall cost of operation....
The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies
Grasha, K.; Calzetti, D.; Adamo, A.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Dale, D. A.; Fumagalli, M.; Grebel, E. K.; Johnson, K. E.; Kahre, L.; Kennicutt, R. C.; Messa, M.; Pellerin, A.; Ryon, J. E.; Smith, L. J.; Shabani, F.; Thilker, D.; Ubeda, L.
2017-05-01
We present a study of the hierarchical clustering of the young stellar clusters in six local (3-15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ˜40-60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.
Content Based Image Retrieval using Hierarchical and K-Means Clustering Techniques
Directory of Open Access Journals (Sweden)
V.S.V.S. Murthy
2010-03-01
Full Text Available In this paper we present an image retrieval system that takes an image as the input query and retrieves images based on image content. Content Based Image Retrieval is an approach for retrieving semantically-relevant images from an image database based on automatically-derived image features. The unique aspect of the system is the utilization of hierarchical and k-means clustering techniques. The proposed procedure consists of two stages. First, here we are going to filter most of the images in the hierarchical clustering and then apply the clustered images to KMeans, so that we can get better favored image results.
Hearing the clusters in a graph: A distributed algorithm
Sahai, Tuhin; Banaszuk, Andrzej
2009-01-01
We propose a novel distributed algorithm to decompose graphs or cluster data. The algorithm recovers the solution obtained from spectral clustering without need for expensive eigenvalue/ eigenvector computations. We demonstrate that by solving the wave equation on the graph, every node can assign itself to a cluster by performing a local fast Fourier transform. We prove the equivalence of our algorithm to spectral clustering, derive convergence rates and demonstrate it on examples.
A High-Order CFS Algorithm for Clustering Big Data
Fanyu Bu; Zhikui Chen; Peng Li; Tong Tang; Ying Zhang
2016-01-01
With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS) to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learn...
Hierarchical Control for Multiple DC-Microgrids Clusters
DEFF Research Database (Denmark)
Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos
2014-01-01
DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed in this p......DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed...
Directory of Open Access Journals (Sweden)
Hanane FROUD
2013-11-01
Full Text Available Document Clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. The useful expressions in the documents is often accompanied by a large amount of noise that is caused by the use of unnecessary words, so it is indispensable to eliminate it and keeping just the useful information. Keyphrases extraction systems in Arabic are new phenomena. A number of Text Mining applications can use it to improve her results. The Keyphrases are defined as phrases that capture the main topics discussed in document; they offer a brief and precise summary of document content. Therefore, it can be a good solution to get rid of the existent noise from documents. In this paper, we propose a new method to solve the problem cited above especially for Arabic language documents, which is one of the most complex languages, by using a new Keyphrases extraction algorithm based on the Suffix Tree data structure (KpST. To evaluate our approach, we conduct an experimental study on Arabic Documents Clustering using the most popular approach of Hierarchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a variety of distance functions and similarity measures to perform Arabic Document Clustering task. The obtained results show that our approach for extracting Keyphrases improves the clustering results.
Directory of Open Access Journals (Sweden)
Odilia Yim
2015-02-01
Full Text Available Cluster analysis refers to a class of data reduction methods used for sorting cases, observations, or variables of a given dataset into homogeneous groups that differ from each other. The present paper focuses on hierarchical agglomerative cluster analysis, a statistical technique where groups are sequentially created by systematically merging similar clusters together, as dictated by the distance and linkage measures chosen by the researcher. Specific distance and linkage measures are reviewed, including a discussion of how these choices can influence the clustering process by comparing three common linkage measures (single linkage, complete linkage, average linkage. The tutorial guides researchers in performing a hierarchical cluster analysis using the SPSS statistical software. Through an example, we demonstrate how cluster analysis can be used to detect meaningful subgroups in a sample of bilinguals by examining various language variables.
Improvement and Parallelism of k-Means Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
TIAN Jinlan; ZHU Lin; ZHANG Suqin; LIU Lu
2005-01-01
The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms.
Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2014-04-29
To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Periorbital melasma: Hierarchical cluster analysis of clinical features in Asian patients.
Jung, Y S; Bae, J M; Kim, B J; Kang, J-S; Cho, S B
2017-03-19
Studies have shown melasma lesions to be distributed across the face in centrofacial, malar, and mandibular patterns. Meanwhile, however, melasma lesions of the periorbital area have yet to be thoroughly described. We analyzed normal and ultraviolet light-exposed photographs of patients with melasma. The periorbital melasma lesions were measured according to anatomical reference points and a hierarchical cluster analysis was performed. The periorbital melasma lesions showed clinical features of fine and homogenous melasma pigmentation, involving both the upper and lower eyelids that extended to other anatomical sites with a darker and coarser appearance. The hierarchical cluster analysis indicated that patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. Significant differences between cluster 1 and cluster 2 were found in lateral distance and inferolateral distance, but not in medial distance and superior distance. Comparing the two clusters, patients in cluster 2 were found to be significantly older and more commonly accompanied by melasma lesions of the temple and medial cheek. Our hierarchical cluster analysis of periorbital melasma lesions demonstrated that Asian patients with periorbital melasma can be categorized into two clusters according to the surface anatomy of the face. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
Tsou, Chi-Hsuan; Lor, Kuo-Lung; Chang, Yeun-Chung; Chen, Chung-Ming
2015-05-14
This paper proposes a semantic segmentation algorithm that provides the spatial distribution patterns of pulmonary ground-glass nodules with solid portions in computed tomography (CT) images. The proposed segmentation algorithm, anatomy packing with hierarchical segments (APHS), performs pulmonary nodule segmentation and quantification in CT images. In particular, the APHS algorithm consists of two essential processes: hierarchical segmentation tree construction and anatomy packing. It constructs the hierarchical segmentation tree based on region attributes and local contour cues along the region boundaries. Each node of the tree corresponds to the soft boundary associated with a family of nested segmentations through different scales applied by a hierarchical segmentation operator that is used to decompose the image in a structurally coherent manner. The anatomy packing process detects and localizes individual object instances by optimizing a hierarchical conditional random field model. Ninety-two histopathologically confirmed pulmonary nodules were used to evaluate the performance of the proposed APHS algorithm. Further, a comparative study was conducted with two conventional multi-label image segmentation algorithms based on four assessment metrics: the modified Williams index, percentage statistic, overlapping ratio, and difference ratio. Under the same framework, the proposed APHS algorithm was applied to two clinical applications: multi-label segmentation of nodules with a solid portion and surrounding tissues and pulmonary nodule segmentation. The results obtained indicate that the APHS-generated boundaries are comparable to manual delineations with a modified Williams index of 1.013. Further, the resulting segmentation of the APHS algorithm is also better than that achieved by two conventional multi-label image segmentation algorithms. The proposed two-level hierarchical segmentation algorithm effectively labelled the pulmonary nodule and its surrounding
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
Institute of Scientific and Technical Information of China (English)
Ohn Mar San; Van-Nam Huynh; Yoshiteru Nakamori
2003-01-01
Most of the earlier work on clustering mainly focused on numeric data whose inherent geometric properties can be exploited to naturally define distance functions between data points. However, data mining applications frequently involve many datasets that also consists of mixed numeric and categorical attributes. In this paper we present a clustering algorithm which is based on the k-means algorithm. The algorithm clusters objects with numeric and categorical attributes in a way similar to k-means. The object similarity measure is derived from both numeric and categorical attributes. When applied to numeric data, the algorithm is identical to the k-means. The main result of this paper is to provide a method to update the "cluster centers" of clustering objects described by mixed numeric and categorical attributes in the clustering process to minimize the clustering cost function. The clustering performance of the algorithm is demonstrated with the two well known data sets, namely credit approval and abalone databases.
EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES
Directory of Open Access Journals (Sweden)
D.Kerana Hanirex
2011-03-01
Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Directory of Open Access Journals (Sweden)
Xin Chen
2014-01-01
Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
The Georgi algorithms of jet clustering
Ge, Shao-Feng
2015-05-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to J {/β ( n)} with a jet function index n in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is process-independent and hence logically consistent, for softer subjets the inclusion cone is larger to conform with the fact that parton shower tends to emit softer partons at earlier stage with larger opening angle, and that the cone size cannot be too large in order to avoid mixing up neighbor jets, we derive constraints on the jet function parameter β and index n which are closely related to cone size cutoff. Finally, we discuss how jet function values can be made invariant under Lorentz boost.
Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm
Khashabi, Daniel; Wieting, John; Liu, Jeffrey Yufei; Liang, Feng
2015-01-01
In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of ou...
Directory of Open Access Journals (Sweden)
Yang Chunhe
2016-01-01
Full Text Available The hierarchical clustering method has been used for exploration of gene expression and proteomic profiles; however, little research into its application in the examination of expression of multiplecytokine/chemokine responses to stimuli has been reported. Thus, little progress has been made on how phytohemagglutinin(PHA affects cytokine expression profiling on a large scale in the human hematological system. To investigate the characteristic expression pattern under PHA stimulation, Luminex, a multiplex bead-based suspension array, was performed. The data set collected from human peripheral blood mononuclear cells (PBMC was analyzed using the hierarchical clustering method. It was revealed that two specific chemokines (CCL3 andCCL4 underwent significantly greater quantitative changes during induction of expression than other tested cytokines/chemokines after PHA stimulation. This result indicates that hierarchical clustering is a useful tool for detecting fine patterns during exploration of biological data, and that it can play an important role in comparative studies.
A REAL-TIME C-V CLUSTERING ALGORITHM FOR WEB-MINING
Institute of Scientific and Technical Information of China (English)
Li Haiying; Zhuang Zhenquan; Li Bin; Wan Ke
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.
The Evolution of Galaxy Clustering in Hierarchical Models
1999-01-01
The main ingredients of recent semi-analytic models of galaxy formation are summarised. We present predictions for the galaxy clustering properties of a well specified LCDM model whose parameters are constrained by observed local galaxy properties. We present preliminary predictions for evolution of clustering that can be probed with deep pencil beam surveys.
Genetic Algorithms for Auto-Clustering in KDD
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
In solving the clustering problem in the context of knowledge discovery in databases (KDD), the traditional methods, for example, the K-means algorithm and its variants, usually require the users to provide the number of clusters in advance based on the pro-information. Unfortunately, the number of clusters in general is unknown to the users who are usually short of pro-information. Therefore, the clustering calculation becomes a tedious trial-and-error work, and the result is often not global optimal especially when the number of clusters is large. In this paper, a new dynamic clustering method based on genetic algorithms (GA) is proposed and applied for auto-clustering of data entities in large databases. The algorithm can automatically cluster the data according to their similarities and find the exact number of clusters. Experiment results indicate that the method is of global optimization by dynamically clustering logic.
Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary
2014-11-01
Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
A Novel Clustering Algorithm Inspired by Membrane Computing
Directory of Open Access Journals (Sweden)
Hong Peng
2015-01-01
Full Text Available P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.
Hierarchical Resource Allocation in Femtocell Networks using Graph Algorithms
Sadr, Sanam
2012-01-01
This paper presents a hierarchical approach to resource allocation in open-access femtocell networks. The major challenge in femtocell networks is interference management which in our system, based on the Long Term Evolution (LTE) standard, translates to which user should be allocated which physical resource block (or fraction thereof) from which femtocell access point (FAP). The globally optimal solution requires integer programming and is mathematically intractable. We propose a hierarchical three-stage solution: first, the load of each FAP is estimated considering the number of users connected to the FAP, their average channel gain and required data rates. Second, based on each FAP's load, the physical resource blocks (PRBs) are allocated to FAPs in a manner that minimizes the interference by coloring the modified interference graph. Finally, the resource allocation is performed at each FAP considering users' instantaneous channel gain. The two major advantages of this suboptimal approach are the significa...
An energy efficient clustering routing algorithm for wireless sensor networks
Institute of Scientific and Technical Information of China (English)
LI Li; DONG Shu-song; WEN Xiang-ming
2006-01-01
This article proposes an energy efficient clustering routing (EECR) algorithm for wireless sensor network. The algorithm can divide a sensor network into a few clusters and select a cluster head base on weight value that leads to more uniform energy dissipation evenly among all sensor nodes.Simulations and results show that the algorithm can save overall energy consumption and extend the lifetime of the wireless sensor network.
Energy Technology Data Exchange (ETDEWEB)
Nash, Stephen G.
2013-11-11
The research focuses on the modeling and optimization of nanoporous materials. In systems with hierarchical structure that we consider, the physics changes as the scale of the problem is reduced and it can be important to account for physics at the fine level to obtain accurate approximations at coarser levels. For example, nanoporous materials hold promise for energy production and storage. A significant issue is the fabrication of channels within these materials to allow rapid diffusion through the material. One goal of our research is to apply optimization methods to the design of nanoporous materials. Such problems are large and challenging, with hierarchical structure that we believe can be exploited, and with a large range of important scales, down to atomistic. This requires research on large-scale optimization for systems that exhibit different physics at different scales, and the development of algorithms applicable to designing nanoporous materials for many important applications in energy production, storage, distribution, and use. Our research has two major research thrusts. The first is hierarchical modeling. We plan to develop and study hierarchical optimization models for nanoporous materials. The models have hierarchical structure, and attempt to balance the conflicting aims of model fidelity and computational tractability. In addition, we analyze the general hierarchical model, as well as the specific application models, to determine their properties, particularly those properties that are relevant to the hierarchical optimization algorithms. The second thrust was to develop, analyze, and implement a class of hierarchical optimization algorithms, and apply them to the hierarchical models we have developed. We adapted and extended the optimization-based multigrid algorithms of Lewis and Nash to the optimization models exemplified by the hierarchical optimization model. This class of multigrid algorithms has been shown to be a powerful tool for
Introduction to Clustering Algorithms and Applications
Yang, Sibei; Tao, Liangde; Gong, Bingchen
2014-01-01
Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different fields are actively working on the clustering problem. This paper provides an overview of the different representative clustering methods. In addition, application of clustering in different field is briefly introduced.
Mapping informative clusters in a hierarchical [corrected] framework of FMRI multivariate analysis.
Directory of Open Access Journals (Sweden)
Rui Xu
Full Text Available Pattern recognition methods have become increasingly popular in fMRI data analysis, which are powerful in discriminating between multi-voxel patterns of brain activities associated with different mental states. However, when they are used in functional brain mapping, the location of discriminative voxels varies significantly, raising difficulties in interpreting the locus of the effect. Here we proposed a hierarchical framework of multivariate approach that maps informative clusters rather than voxels to achieve reliable functional brain mapping without compromising the discriminative power. In particular, we first searched for local homogeneous clusters that consisted of voxels with similar response profiles. Then, a multi-voxel classifier was built for each cluster to extract discriminative information from the multi-voxel patterns. Finally, through multivariate ranking, outputs from the classifiers were served as a multi-cluster pattern to identify informative clusters by examining interactions among clusters. Results from both simulated and real fMRI data demonstrated that this hierarchical approach showed better performance in the robustness of functional brain mapping than traditional voxel-based multivariate methods. In addition, the mapped clusters were highly overlapped for two perceptually equivalent object categories, further confirming the validity of our approach. In short, the hierarchical framework of multivariate approach is suitable for both pattern classification and brain mapping in fMRI studies.
PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
ZHOU HaoFeng(周皓峰); YUAN QingQing(袁晴晴); CHENG ZunPing(程尊平); SHI BaiLe(施伯乐)
2003-01-01
Cluster analysis is a process to classify data in a specified data set. In this field,much attention is paid to high-efficiency clustering algorithms. In this paper, the features in thecurrent partition-based and hierarchy-based algorithms are reviewed, and a new hierarchy-basedalgorithm PHC is proposed by combining advantages of both algorithms, which uses the cohesionand the closeness to amalgamate the clusters. Compared with similar algorithms, the performanceof PHC is improved, and the quality of clustering is guaranteed. And both the features were provedby the theoretic and experimental analyses in the paper.
Counterexamples to convergence theorem of maximum-entropy clustering algorithm
Institute of Scientific and Technical Information of China (English)
于剑; 石洪波; 黄厚宽; 孙喜晨; 程乾生
2003-01-01
In this paper, we surveyed the development of maximum-entropy clustering algorithm, pointed out that the maximum-entropy clustering algorithm is not new in essence, and constructed two examples to show that the iterative sequence given by the maximum-entropy clustering algorithm may not converge to a local minimum of its objective function, but a saddle point. Based on these results, our paper shows that the convergence theorem of maximum-entropy clustering algorithm put forward by Kenneth Rose et al. does not hold in general cases.
An Incremental Algorithm of Text Clustering Based on Semantic Sequences
Institute of Scientific and Technical Information of China (English)
FENG Zhonghui; SHEN Junyi; BAO Junpeng
2006-01-01
This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.
A new efficient Cluster Algorithm for the Ising Model
Nyffeler, M; Wiese, U J; Nyfeler, Matthias; Pepe, Michele; Wiese, Uwe-Jens
2005-01-01
Using D-theory we construct a new efficient cluster algorithm for the Ising model. The construction is very different from the standard Swendsen-Wang algorithm and related to worm algorithms. With the new algorithm we have measured the correlation function with high precision over a surprisingly large number of orders of magnitude.
URL Mining Using Agglomerative Clustering Algorithm
Directory of Open Access Journals (Sweden)
Chinmay R. Deshmukh
2015-02-01
Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.
A fingerprint identification algorithm by clustering similarity
Institute of Scientific and Technical Information of China (English)
TIAN Jie; HE Yuliang; CHEN Hong; YANG Xin
2005-01-01
This paper introduces a fingerprint identification algorithm by clustering similarity with the view to overcome the dilemmas encountered in fingerprint identification.To decrease multi-spectrum noises in a fingerprint, we first use a dyadic scale space (DSS) method for image enhancement. The second step describes the relative features among minutiae by building a minutia-simplex which contains a pair of minutiae and their local associated ridge information, with its transformation-variant and invariant relative features applied for comprehensive similarity measurement and for parameter estimation respectively. The clustering method is employed to estimate the transformation space.Finally, multi-resolution technique is used to find an optimal transformation model for getting the maximal mutual information between the input and the template features. The experimental results including the performance evaluation by the 2nd International Verification Competition in 2002 (FVC2002), over the four fingerprint databases of FVC2002 indicate that our method is promising in an automatic fingerprint identification system (AFIS).
Directory of Open Access Journals (Sweden)
Górecki J.
2017-01-01
Full Text Available Several successful approaches to structure determination of hierarchical Archimedean copulas (HACs proposed in the literature rely on agglomerative clustering and Kendall’s correlation coefficient. However, there has not been presented any theoretical proof justifying such approaches. This work fills this gap and introduces a theorem showing that, given the matrix of the pairwise Kendall correlation coefficients corresponding to a HAC, its structure can be recovered by an agglomerative clustering technique.
Hierarchical black hole triples in young star clusters: impact of Kozai-Lidov resonance on mergers
Kimpson, Thomas O; Mapelli, Michela; Ziosi, Brunetto M
2016-01-01
Mergers of compact object binaries are one of the most powerful sources of gravitational waves (GWs) in the frequency range of second-generation ground-based gravitational wave detectors (Advanced LIGO and Virgo). Dynamical simulations of young dense star clusters (SCs) indicate that ~27 per cent of all double compact object binaries are members of hierarchical triple systems (HTs). In this paper, we consider 570 HTs composed of three compact objects (black holes or neutron stars) that formed dynamically in N-body simulations of young dense SCs. We simulate them for a Hubble time with a new code based on the Mikkola's algorithmic regularization scheme, including the 2.5 post-Newtonian term. We find that ~88 per cent of the simulated systems develop Kozai-Lidov (KL) oscillations. KL resonance triggers the merger of the inner binary in three systems (corresponding to 0.5 per cent of the simulated HTs), by increasing the eccentricity of the inner binary. Accounting for KL oscillations leads to an increase of the...
Hierarchical black hole triples in young star clusters: impact of Kozai-Lidov resonance on mergers
Kimpson, Thomas O.; Spera, Mario; Mapelli, Michela; Ziosi, Brunetto M.
2016-12-01
Mergers of compact-object binaries are one of the most powerful sources of gravitational waves (GWs) in the frequency range of second-generation ground-based GW detectors (advanced LIGO and Virgo). Dynamical simulations of young dense star clusters (SCs) indicate that ˜27 per cent of all double compact-object binaries are members of hierarchical triple systems (HTs). In this paper, we consider 570 HTs composed of three compact objects (black holes or neutron stars) that formed dynamically in N-body simulations of young dense SCs. We simulate them for a Hubble time with a new code based on the Mikkola's algorithmic regularization scheme, including the 2.5 post-Newtonian term. We find that ˜88 per cent of the simulated systems develop Kozai-Lidov (KL) oscillations. KL resonance triggers the merger of the inner binary in three systems (corresponding to 0.5 per cent of the simulated HTs), by increasing the eccentricity of the inner binary. Accounting for KL oscillations leads to an increase of the total expected merger rate by ≈50 per cent. All binaries that merge because of KL oscillations were formed by dynamical exchanges (i.e. none is a primordial binary) and have chirp mass >20 M⊙. This result might be crucial to interpret the formation channel of the first recently detected GW events.
Unglert, K.; Radić, V.; Jellinek, A. M.
2016-06-01
Variations in the spectral content of volcano seismicity related to changes in volcanic activity are commonly identified manually in spectrograms. However, long time series of monitoring data at volcano observatories require tools to facilitate automated and rapid processing. Techniques such as self-organizing maps (SOM) and principal component analysis (PCA) can help to quickly and automatically identify important patterns related to impending eruptions. For the first time, we evaluate the performance of SOM and PCA on synthetic volcano seismic spectra constructed from observations during two well-studied eruptions at Klauea Volcano, Hawai'i, that include features observed in many volcanic settings. In particular, our objective is to test which of the techniques can best retrieve a set of three spectral patterns that we used to compose a synthetic spectrogram. We find that, without a priori knowledge of the given set of patterns, neither SOM nor PCA can directly recover the spectra. We thus test hierarchical clustering, a commonly used method, to investigate whether clustering in the space of the principal components and on the SOM, respectively, can retrieve the known patterns. Our clustering method applied to the SOM fails to detect the correct number and shape of the known input spectra. In contrast, clustering of the data reconstructed by the first three PCA modes reproduces these patterns and their occurrence in time more consistently. This result suggests that PCA in combination with hierarchical clustering is a powerful practical tool for automated identification of characteristic patterns in volcano seismic spectra. Our results indicate that, in contrast to PCA, common clustering algorithms may not be ideal to group patterns on the SOM and that it is crucial to evaluate the performance of these tools on a control dataset prior to their application to real data.
Hocking, Alex; Davey, Neil; Sun, Yi
2015-01-01
We present a novel unsupervised learning approach to automatically segment and label images in astronomical surveys. Automation of this procedure will be essential as next-generation surveys enter the petabyte scale: data volumes will exceed the capability of even large crowd-sourced analyses. We demonstrate how a growing neural gas (GNG) can be used to encode the feature space of imaging data. When coupled with a technique called hierarchical clustering, imaging data can be automatically segmented and labelled by organising nodes in the GNG. The key distinction of unsupervised learning is that these labels need not be known prior to training, rather they are determined by the algorithm itself. Importantly, after training a network can be be presented with images it has never 'seen' before and provide consistent categorisation of features. As a proof-of-concept we demonstrate application on data from the Hubble Space Telescope Frontier Fields: images of clusters of galaxies containing a mixture of galaxy type...
Hierarchical cluster-tendency analysis of the group structure in the foreign exchange market
Wu, Xin-Ye; Zheng, Zhi-Gang
2013-08-01
A hierarchical cluster-tendency (HCT) method in analyzing the group structure of networks of the global foreign exchange (FX) market is proposed by combining the advantages of both the minimal spanning tree (MST) and the hierarchical tree (HT). Fifty currencies of the top 50 World GDP in 2010 according to World Bank's database are chosen as the underlying system. By using the HCT method, all nodes in the FX market network can be "colored" and distinguished. We reveal that the FX networks can be divided into two groups, i.e., the Asia-Pacific group and the Pan-European group. The results given by the hierarchical cluster-tendency method agree well with the formerly observed geographical aggregation behavior in the FX market. Moreover, an oil-resource aggregation phenomenon is discovered by using our method. We find that gold could be a better numeraire for the weekly-frequency FX data.
Local Community Detection Algorithm Based on Minimal Cluster
Directory of Open Access Journals (Sweden)
Yong Zhou
2016-01-01
Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.
A Load Balance Routing Algorithm Based on Uneven Clustering
Directory of Open Access Journals (Sweden)
Liang Yuan
2013-10-01
Full Text Available Aiming at the problem of uneven load in clustering Wireless Sensor Network (WSN, a kind of load balance routing algorithm based on uneven clustering is proposed to do uneven clustering and calculate optimal number of clustering. This algorithm prevents the number of common node under some certain cluster head from being too large which leads load to be overweight to death through even node clustering. It constructs evaluation function which can better reflect residual energy distribution of nodes and at the same time constructs routing evaluation function between cluster heads which uses MATLAB to do simulation on the performance of this algorithm. Simulation result shows that the routing established by this algorithm effectively improves network’s energy balance and lengthens the life cycle of network.
Prediction of in vitro and in vivo oestrogen receptor activity using hierarchical clustering
In this study, hierarchical clustering classification models were developed to predict in vitro and in vivo oestrogen receptor (ER) activity. Classification models were developed for binding, agonist, and antagonist in vitro ER activity and for mouse in vivo uterotrophic ER bindi...
Non-Hierarchical Clustering as a method to analyse an open-ended ...
African Journals Online (AJOL)
Apple
tests, provide instructors with tools to probe students' conceptual knowledge of various fields of science and ... quantitative non-hierarchical clustering analysis method known as k-means (Everitt, Landau, Leese & Stahl, ...... undergraduate engineering students in creating ... mathematics-formal reasoning and the contextual.
Prediction of in vitro and in vivo oestrogen receptor activity using hierarchical clustering
In this study, hierarchical clustering classification models were developed to predict in vitro and in vivo oestrogen receptor (ER) activity. Classification models were developed for binding, agonist, and antagonist in vitro ER activity and for mouse in vivo uterotrophic ER bindi...
Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous Cluster
Directory of Open Access Journals (Sweden)
Mayuri A Mehta
2015-12-01
Full Text Available A scheduling algorithm is required to efficiently manage cluster resources in a Hadoop cluster, thereby to increase resource utilization and to reduce response time. The job aware scheduling algorithm schedules non-local map tasks of jobs based on job execution time, earliest deadline first or workload of the job. In this paper, we present the performance evaluation of the job aware scheduling algorithm using MapReduce WordCount benchmark. The experimental results are compared with matchmaking scheduling algorithm. The results show that the job aware scheduling algorithm reduces average waiting time and memory wastage considerably as compared to matchmaking algorithm.
Hierarchical Genetic Algorithm Approach to Determine Pulse Sequences in NMR
Ajoy, Ashok
2009-01-01
We develop a new class of genetic algorithm that computationally determines efficient pulse sequences to implement a quantum gate U in a three-qubit system. The method is shown to be quite general, and the same algorithm can be used to derive efficient sequences for a variety of target matrices. We demonstrate this by implementing the inversion-on-equality gate efficiently when the spin-spin coupling constants $J_{12}=J_{23}=J$ and $J_{13}=0$. We also propose new pulse sequences to implement the Parity gate and Fanout gate, which are about 50% more efficient than the previous best efforts. Moreover, these sequences are shown to require significantly less RF power for their implementation. The proposed algorithm introduces several new features in the conventional genetic algorithm framework. We use matrices instead of linear chains, and the columns of these matrices have a well defined hierarchy. The algorithm is a genetic algorithm coupled to a fast local optimizer, and is hence a hybrid GA. It shows fast con...
Study of the Artificial Fish Swarm Algorithm for Hybrid Clustering
Directory of Open Access Journals (Sweden)
Hongwei Zhao
2015-06-01
Full Text Available The basic Artificial Fish Swarm (AFS Algorithm is a new type of an heuristic swarm intelligence algorithm, but it is difficult to optimize to get high precision due to the randomness of the artificial fish behavior, which belongs to the intelligence algorithm. This paper presents an extended AFS algorithm, namely the Cooperative Artificial Fish Swarm (CAFS, which significantly improves the original AFS in solving complex optimization problems. K-medoids clustering algorithm is being used to classify data, but the approach is sensitive to the initial selection of the centers with low quality of the divided cluster. A novel hybrid clustering method based on the CAFS and K-medoids could be used for solving clustering problems. In this work, first, CAFS algorithm is used for optimizing six widely-used benchmark functions, coming up with comparative results produced by AFS and CAFS, then Particle Swarm Optimization (PSO is studied. Second, the hybrid algorithm with K-medoids and CAFS algorithms is used for data clustering on several benchmark data sets. The performance of the hybrid algorithm based on K-medoids and CAFS is compared with AFS and CAFS algorithms on a clustering problem. The simulation results show that the proposed CAFS outperforms the other two algorithms in terms of accuracy and robustness.
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2008-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2006-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
A REAL—TIME C—V CLUSTERING ALGORITHM FOR WEB—MINING
Institute of Scientific and Technical Information of China (English)
LiHaiying; ZuangZhenquan; 等
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site.The algo-fithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value[0,1] input and self-definition vigilance parameter to design clustering-architecture.Vector Degree of Matching(VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic.Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99.This non-linear relation between vigilance parameter and classification upper limit helps mining out representa-tive classifications from net-users according to the actural web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and reapidly.
Simulated annealing spectral clustering algorithm for image segmentation
Institute of Scientific and Technical Information of China (English)
Yifang Yang; and Yuping Wang
2014-01-01
The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity mea-sure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid conver-gence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently ap-ply the algorithm to image segmentation, the Nystr¨om method is used to reduce the computation complexity. Experimental re-sults show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.
A Flocking Based algorithm for Document Clustering Analysis
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL
2006-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.
Mercer Kernel Based Fuzzy Clustering Self-Adaptive Algorithm
Institute of Scientific and Technical Information of China (English)
李侃; 刘玉树
2004-01-01
A novel mercer kernel based fuzzy clustering self-adaptive algorithm is presented. The mercer kernel method is introduced to the fuzzy c-means clustering. It may map implicitly the input data into the high-dimensional feature space through the nonlinear transformation. Among other fuzzy c-means and its variants, the number of clusters is first determined. A self-adaptive algorithm is proposed. The number of clusters, which is not given in advance, can be gotten automatically by a validity measure function. Finally, experiments are given to show better performance with the method of kernel based fuzzy c-means self-adaptive algorithm.
Hopfield-K-Means clustering algorithm: A proposal for the segmentation of electricity customers
Energy Technology Data Exchange (ETDEWEB)
Lopez, Jose J.; Aguado, Jose A.; Martin, F.; Munoz, F.; Rodriguez, A.; Ruiz, Jose E. [Department of Electrical Engineering, University of Malaga, C/ Dr. Ortiz Ramos, sn., Escuela de Ingenierias, 29071 Malaga (Spain)
2011-02-15
Customer classification aims at providing electric utilities with a volume of information to enable them to establish different types of tariffs. Several methods have been used to segment electricity customers, including, among others, the hierarchical clustering, Modified Follow the Leader and K-Means methods. These, however, entail problems with the pre-allocation of the number of clusters (Follow the Leader), randomness of the solution (K-Means) and improvement of the solution obtained (hierarchical algorithm). Another segmentation method used is Hopfield's autonomous recurrent neural network, although the solution obtained only guarantees that it is a local minimum. In this paper, we present the Hopfield-K-Means algorithm in order to overcome these limitations. This approach eliminates the randomness of the initial solution provided by K-Means based algorithms and it moves closer to the global optimun. The proposed algorithm is also compared against other customer segmentation and characterization techniques, on the basis of relative validation indexes. Finally, the results obtained by this algorithm with a set of 230 electricity customers (residential, industrial and administrative) are presented. (author)
Android Malware Classification Using K-Means Clustering Algorithm
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis
Directory of Open Access Journals (Sweden)
S. Muthurajkumar
2014-05-01
Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.
DEFF Research Database (Denmark)
Ussery, David; Bohlin, Jon; Skjerve, Eystein
2009-01-01
Recently there has been an explosion in the availability of bacterial genomic sequences, making possible now an analysis of genomic signatures across more than 800 hundred different bacterial chromosomes, from a wide variety of environments. Using genomic signatures, we pair-wise compared 867...... different genomic DNA sequences, taken from chromosomes and plasmids more than 100,000 base-pairs in length. Hierarchical clustering was performed on the outcome of the comparisons before a multinomial regression model was fitted. The regression model included the cluster groups as the response variable...... AT content. Small improvements to the regression model, although significant, were also obtained by factors such as sequence size, habitat, growth temperature, selective pressure measured as oligonucleotide usage variance, and oxygen requirement.The statistics obtained using hierarchical clustering...
Signatures of Hierarchical Clustering in Dark Matter Detection Experiments
Stiff, D; Frieman, Joshua A
2001-01-01
In the cold dark matter model of structure formation, galaxies are assembled hierarchically from mergers and the accretion of subclumps. This process is expected to leave residual substructure in the Galactic dark halo, including partially disrupted clumps and their associated tidal debris. We develop a model for such halo substructure and study its implications for dark matter (WIMP and axion) detection experiments. We combine the Press-Schechter model for the distribution of halo subclump masses with N-body simulations of the evolution and disruption of individual clumps as they orbit through the evolving Galaxy to derive the probability that the Earth is passing through a subclump or stream of a given density. Our results suggest that it is likely that the local complement of dark matter particles includes a 1-5% contribution from a single clump. The implications for dark matter detection experiments are significant, since the disrupted clump is composed of a `cold' flow of high-velocity particles. We desc...
A new hybrid imperialist competitive algorithm on data clustering
Indian Academy of Sciences (India)
Taher Niknam; Elahe Taherian Fard; Shervin Ehrampoosh; Alireza Rousta
2011-06-01
Clustering is a process for partitioning datasets. This technique is very useful for optimum solution. -means is one of the simplest and the most famous methods that is based on square error criterion. This algorithm depends on initial states and converges to local optima. Some recent researches show that -means algorithm has been successfully applied to combinatorial optimization problems for clustering. In this paper, we purpose a novel algorithm that is based on combining two algorithms of clustering; -means and Modify Imperialist Competitive Algorithm. It is named hybrid K-MICA. In addition, we use a method called modiﬁed expectation maximization (EM) to determine number of clusters. The experimented results show that the new method carries out better results than the ACO, PSO, Simulated Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS), Honey Bee Mating Optimization (HBMO) and -means.
Extension of K-Modes Algorithm for Generating Clusters Automatically
Directory of Open Access Journals (Sweden)
Anupama Chadha
2016-03-01
Full Text Available —K-Modes is an eminent algorithm for clustering data set with categorical attributes. This algorithm is famous for its simplicity and speed. The KModes is an extension of the K-Means algorithm for categorical data. Since K-Modes is used for categorical data so ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Modes’ of clusters are used instead of ‘Means’. However, one major limitation of this algorithm is dependency on prior input of number of clusters K, and sometimes it becomes practically impossible to correctly estimate the optimum number of clusters in advance. In this paper we have proposed an algorithm which will overcome this limitation while maintaining the simplicity of K-Modes algorithm
Resource Allocation in Public Cluster with Extended Optimization Algorithm
Akbar, Z.; Handoko, L. T.
2007-01-01
We introduce an optimization algorithm for resource allocation in the LIPI Public Cluster to optimize its usage according to incoming requests from users. The tool is an extended and modified genetic algorithm developed to match specific natures of public cluster. We present a detail analysis of optimization, and compare the results with the exact calculation. We show that it would be very useful and could realize an automatic decision making system for public clusters.
An ACO Algorithm for Effective Cluster Head Selection
Sampath, Amritha; Thampi, Sabu M; 10.4304/jait.2.1.50-56
2011-01-01
This paper presents an effective algorithm for selecting cluster heads in mobile ad hoc networks using ant colony optimization. A cluster in an ad hoc network consists of a cluster head and cluster members which are at one hop away from the cluster head. The cluster head allocates the resources to its cluster members. Clustering in MANET is done to reduce the communication overhead and thereby increase the network performance. A MANET can have many clusters in it. This paper presents an algorithm which is a combination of the four main clustering schemes- the ID based clustering, connectivity based, probability based and the weighted approach. An Ant colony optimization based approach is used to minimize the number of clusters in MANET. This can also be considered as a minimum dominating set problem in graph theory. The algorithm considers various parameters like the number of nodes, the transmission range etc. Experimental results show that the proposed algorithm is an effective methodology for finding out t...
Effective Hierarchical Routing Algorithm for Large-scale Wireless Mobile Networks
Directory of Open Access Journals (Sweden)
Guofeng Yan
2014-02-01
Full Text Available The growing interest in wireless mobile network techniques has resulted in many routing protocol proposals. The unpredictable motion and the unreliable behavior of mobile nodes is one of the key issues in wireless mobile network. Virtual mobile node (VMN consists of robust virtual nodes that are both predictable and reliable. Based on VMN, in this paper, we present a hierarchical routing algorithm, i.e., EHRA-WAVE, for large-scale wireless mobile networks. By using mobile WAVE technology, a routing path can be found rapidly between VMNs without accurate topology information. We introduce the routing algorithm and the implementation issues of the proposed EHRA-WAVE routing algorithm. Finally, we evaluate the performance of EHRA-WAVE through experiments, and compare the performance on VMN failure and message delivery ratio using hierarchical and non-hierarchical routing methods. However, due to the large amounts WAVE flooding, EHRAWAVE results in too large load which would impede the application of the EHRA-WAVE algorithm. Therefore, the further routing protocol focuses on minimizing the number of WAVE using hierarchical structures in large-scale wireless mobile networks.
Bottom-up GGM algorithm for constructing multiple layered hierarchical gene regulatory networks
Multilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways. A bottom-up graphic Gaus...
Squeezer: An Efficient Algorithm for Clustering Categorical Data
Institute of Scientific and Technical Information of China (English)
何增有; 徐晓飞; 邓胜春
2002-01-01
This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deservegood scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning tto an existing cluster (initially none), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to its characteristics, the proposed algorithm isextremely suitable for clustering data streams, where given a sequence of points, the objective isto maintain consistently good clustering of the sequence so far, using a small amount of memoryand time. Outliers can also be handled efficiently and directly in Squeezer. Experimental resultson real-life and synthetic datasets verify the superiority of Squeezer.
A hierarchic collision detection algorithm for simple Brownian dynamics.
Katsimitsoulia, Zoe; Taylor, William R
2010-04-01
We describe an algorithm to avoid steric violation (bumps) between bodies arranged in a hierarchy. The algorithm recursively directs the focus of a bump-detector towards the interactions of children whose parents are in collision. This has the effect of concentrating available computer resources towards maintaining good steric interactions in the region where bodies are colliding. The algorithm was implemented and tested under two programming environments: a graphical environment, OpenGL under Java3D, and a non-graphical environment in "C". The former used a built-in collision detection system whereas the latter used a simple algorithm devised previously for the interaction of "soft" bodies. This simpler system was found to run much faster (by 50-fold) even after allowing for time spent on graphical activity and was also better at preventing steric violations. With a hierarchy of three levels of 100, the non-graphical implementation was able to simulate a million atomic bodies for 100,000 steps in 12h on a laptop computer.
Using Hyper Clustering Algorithms in Mobile Network Planning
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: As a large amount of data stored in spatial databases, people may like to find groups of data which share similar features. Thus cluster analysis becomes an important area of research in data mining. Applications of clustering analysis have been utilized in many fields, such as when we search to construct a cluster served by base station in mobile network. Deciding upon the optimum placement for the base stations to achieve best services while reducing the cost is a complex task requiring vast computational resource. Approach: This study addresses antenna placement problem or the cell planning problem, involves locating and configuring infrastructure for mobile networks by modified the original density-based Spatial Clustering of Applications with Noise algorithm. The Cluster Partitioning around Medoids original algorithm has been modified and a new algorithm has been proposed by the authors in a recent work. In this study, the density-based Spatial Clustering of Applications with Noise original algorithm has been modified and combined with old algorithm to produce the hybrid algorithm Clustering Density Base and Clustering with Weighted Node-Partitioning around Medoids algorithm to solve the problems in Mobile Network Planning. Results: Implementation of this algorithm to a real case study is presented. Results demonstrate that the proposed algorithm has minimum run time minimum cost and high grade of service. Conclusion: The proposed hyper algorithm has the advantage of quick divide the area into clusters where the density base algorithm has a limit iteration and the advantage of accuracy (no sampling method is used and highly grade of service due to the moving of the location of the base stations (medoid toward the heavy loaded (weighted nodes.
Improving the Decision Value of Hierarchical Text Clustering Using Term Overlap Detection
Directory of Open Access Journals (Sweden)
Nilupulee Nathawitharana
2015-09-01
Full Text Available Humans are used to expressing themselves with written language and language provides a medium with which we can describe our experiences in detail incorporating individuality. Even though documents provide a rich source of information, it becomes very difficult to identify, extract, summarize and search when vast amounts of documents are collected especially over time. Document clustering is a technique that has been widely used to group documents based on similarity of content represented by the words used. Once key groups are identified further drill down into sub-groupings is facilitated by the use of hierarchical clustering. Clustering and hierarchical clustering are very useful when applied to numerical and categorical data and cluster accuracy and purity measures exist to evaluate the outcomes of a clustering exercise. Although the same measures have been applied to text clustering, text clusters are based on words or terms which can be repeated across documents associated with different topics. Therefore text data cannot be considered as a direct ‘coding’ of a particular experience or situation in contrast to numerical and categorical data and term overlap is a very common characteristic in text clustering. In this paper we propose a new technique and methodology for term overlap capture from text documents, highlighting the different situations such overlap could signify and discuss why such understanding is important for obtaining value from text clustering. Experiments were conducted using a widely used text document collection where the proposed methodology allowed exploring the term diversity for a given document collection and obtain clusters with minimum term overlap.
Co-clustering models, algorithms and applications
Govaert, Gérard
2013-01-01
Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture
Hybrid Swarm Intelligence Energy Efficient Clustered Routing Algorithm for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Rajeev Kumar
2016-01-01
Full Text Available Currently, wireless sensor networks (WSNs are used in many applications, namely, environment monitoring, disaster management, industrial automation, and medical electronics. Sensor nodes carry many limitations like low battery life, small memory space, and limited computing capability. To create a wireless sensor network more energy efficient, swarm intelligence technique has been applied to resolve many optimization issues in WSNs. In many existing clustering techniques an artificial bee colony (ABC algorithm is utilized to collect information from the field periodically. Nevertheless, in the event based applications, an ant colony optimization (ACO is a good solution to enhance the network lifespan. In this paper, we combine both algorithms (i.e., ABC and ACO and propose a new hybrid ABCACO algorithm to solve a Nondeterministic Polynomial (NP hard and finite problem of WSNs. ABCACO algorithm is divided into three main parts: (i selection of optimal number of subregions and further subregion parts, (ii cluster head selection using ABC algorithm, and (iii efficient data transmission using ACO algorithm. We use a hierarchical clustering technique for data transmission; the data is transmitted from member nodes to the subcluster heads and then from subcluster heads to the elected cluster heads based on some threshold value. Cluster heads use an ACO algorithm to discover the best route for data transmission to the base station (BS. The proposed approach is very useful in designing the framework for forest fire detection and monitoring. The simulation results show that the ABCACO algorithm enhances the stability period by 60% and also improves the goodput by 31% against LEACH and WSNCABC, respectively.
An Approach to Assembly Sequence Plannning Based on Hierarchical Strategy and Genetic Algorithm
Institute of Scientific and Technical Information of China (English)
Niu Xinwen; Ding Han; Xiong Youlun
2001-01-01
Using group and subassembly cluster methods, the hierarchical structure of a product is.generated automatically, which largely reduces the complexity of planning. Based on genetic algofithn the optimal of assembly sequence of each stracture level can be obtained by sequence-bysequence search. As a result, a better assembly sequence of the product can be generated by combining the assembly sequences of all hierarchical structures, which provides more parallelism and flexibility for assembly operations. An industrial example is solved by this new approach.
Multilevel hierarchical kernel spectral clustering for real-life large scale complex networks.
Directory of Open Access Journals (Sweden)
Raghvendra Mall
Full Text Available Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks.
Yi, Wen-Bin; Shen, Li; Qi, Yin-Feng; Tang, Hong
2011-09-01
The paper introduces the Probabilistic Latent Semantic Analysis (PLSA) to the image clustering and an effective image clustering algorithm using the semantic information from PLSA is proposed which is used for hyperspectral images. Firstly, the ISODATA algorithm is used to obtain the initial clustering result of hyperspectral image and the clusters of the initial clustering result are considered as the visual words of the PLSA. Secondly, the object-oriented image segmentation algorithm is used to partition the hyperspectral image and segments with relatively pure pixels are regarded as documents in PLSA. Thirdly, a variety of identification methods which can estimate the best number of cluster centers is combined to get the number of latent semantic topics. Then the conditional distributions of visual words in topics and the mixtures of topics in different documents are estimated by using PLSA. Finally, the conditional probabilistic of latent semantic topics are distinguished using statistical pattern recognition method, the topic type for each visual in each document will be given and the clustering result of hyperspectral image are then achieved. Experimental results show the clusters of the proposed algorithm are better than K-MEANS and ISODATA in terms of object-oriented property and the clustering result is closer to the distribution of real spatial distribution of surface.
Directory of Open Access Journals (Sweden)
Guillaume Marrelec
Full Text Available The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity, provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
Marrelec, Guillaume; Messé, Arnaud; Bellec, Pierre
2015-01-01
The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.
Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression
Halim Boukaram, Wajih
2017-09-14
We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.
Directory of Open Access Journals (Sweden)
Jiang Ting
2010-01-01
Full Text Available We optimize the cluster structure to solve problems such as the uneven energy consumption of the radar sensor nodes and random cluster head selection in the traditional clustering routing algorithm. According to the defined cost function for clusters, we present the clustering algorithm which is based on radio-free space path loss. In addition, we propose the energy and distance pheromones based on the residual energy and aggregation of the radar sensor nodes. According to bionic heuristic algorithm, a new ant colony-based clustering algorithm for radar sensor networks is also proposed. Simulation results show that this algorithm can get a better balance of the energy consumption and then remarkably prolong the lifetime of the radar sensor network.
The evolution of Brightest Cluster Galaxies in a hierarchical universe
Tonini, Chiara; Croton, Darren; Maraston, Claudia; Thomas, Daniel
2012-01-01
We investigate the evolution of Brightest Cluster Galaxies (BCGs) from redshift z~1.6 to z~0. We use the semi-analytic model of Croton et al. (2006) with a new spectro-photometric model based on the Maraston (2005) stellar populations and a new recipe for the dust extinction. We compare the model predictions of the K-band luminosity evolution and the J-K, V-I and I-K colour evolution with a series of datasets, including Collins et al. (Nature, 2009) who argued that semi-analytic models based on the Millennium simulation cannot reproduce the red colours and high luminosity of BCGs at z>1. We show instead that the model is well in range of the observed luminosity and correctly reproduces the colour evolution of BCGs in the whole redshift range up to z~1.6. We argue that the success of the semi-analytic model is in large part due to the implementation of a more sophisticated spectro-photometric model. An analysis of the model BCGs shows an increase in mass by a factor ~2 since z~1, and star formation activity do...
Assessing the Graphical and Algorithmic Structure of Hierarchical Coloured Petri Net Models
Directory of Open Access Journals (Sweden)
George Benwell
1994-11-01
Full Text Available Petri nets, as a modelling formalism, are utilised for the analysis of processes, whether for explicit understanding, database design or business process re-engineering. The formalism, however, can be represented on a virtual continuum from highly graphical to largely algorithmic. The use and understanding of the formalism will, in part, therefore depend on the resultant complexity and power of the representation and, on the graphical or algorithmic preference of the user. This paper develops a metric which will indicate the graphical or algorithmic tendency of hierarchical coloured Petri nets.
Cosine-Based Clustering Algorithm Approach
Directory of Open Access Journals (Sweden)
Mohammed A. H. Lubbad
2012-02-01
Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.
Edge Crossing Minimization Algorithm for Hierarchical Graphs Based on Genetic Algorithms
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
We present an edge crossing minimization algorithm forhierarchical gr aphs based on genetic algorithms, and comparing it with some heuristic algorithm s. The proposed algorithm is more efficient and has the following advantages: th e frame of the algorithms is unified, the method is simple, and its implementati on and revision are easy.
A functional clustering algorithm for the analysis of neural relationships
Feldt, S; Hetrick, V L; Berke, J D; Zochowski, M
2008-01-01
We formulate a novel technique for the detection of functional clusters in neural data. In contrast to prior network clustering algorithms, our procedure progressively combines spike trains and derives the optimal clustering cutoff in a simple and intuitive manner. To demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. We observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Directory of Open Access Journals (Sweden)
Refat Aljumily
2015-09-01
Full Text Available A few literary scholars have long claimed that Shakespeare did not write some of his best plays (history plays and tragedies and proposed at one time or another various suspect authorship candidates. Most modern-day scholars of Shakespeare have rejected this claim, arguing that strong evidence that Shakespeare wrote the plays and poems being his name appears on them as the author. This has caused and led to an ongoing scholarly academic debate for quite some long time. Stylometry is a fast-growing field often used to attribute authorship to anonymous or disputed texts. Stylometric attempts to resolve this literary puzzle have raised interesting questions over the past few years. The following paper contributes to “the Shakespeare authorship question” by using a mathematically-based methodology to examine the hypothesis that Shakespeare wrote all the disputed plays traditionally attributed to him. More specifically, the mathematically based methodology used here is based on Mean Proximity, as a linear hierarchical clustering method, and on Principal Components Analysis, as a non-hierarchical linear clustering method. It is also based, for the first time in the domain, on Self-Organizing Map U-Matrix and Voronoi Map, as non-linear clustering methods to cover the possibility that our data contains significant non-linearities. Vector Space Model (VSM is used to convert texts into vectors in a high dimensional space. The aim of which is to compare the degrees of similarity within and between limited samples of text (the disputed plays. The various works and plays assumed to have been written by Shakespeare and possible authors notably, Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd, where “similarity” is defined in terms of correlation/distance coefficient measure based on the frequency of usage profiles of function words, word bi-grams, and character triple-grams. The claim that Shakespeare authored all the disputed
An Exactly Soluble Hierarchical Clustering Model Inverse Cascades, Self-Similarity, and Scaling
Gabrielov, A; Turcotte, D L
1999-01-01
We show how clustering as a general hierarchical dynamical process proceeds via a sequence of inverse cascades to produce self-similar scaling, as an intermediate asymptotic, which then truncates at the largest spatial scales. We show how this model can provide a general explanation for the behavior of several models that has been described as ``self-organized critical,'' including forest-fire, sandpile, and slider-block models.
Pixel Intensity Clustering Algorithm for Multilevel Image Segmentation
Directory of Open Access Journals (Sweden)
Oludayo O. Olugbara
2015-01-01
Full Text Available Image segmentation is an important problem that has received significant attention in the literature. Over the last few decades, a lot of algorithms were developed to solve image segmentation problem; prominent amongst these are the thresholding algorithms. However, the computational time complexity of thresholding exponentially increases with increasing number of desired thresholds. A wealth of alternative algorithms, notably those based on particle swarm optimization and evolutionary metaheuristics, were proposed to tackle the intrinsic challenges of thresholding. In codicil, clustering based algorithms were developed as multidimensional extensions of thresholding. While these algorithms have demonstrated successful results for fewer thresholds, their computational costs for a large number of thresholds are still a limiting factor. We propose a new clustering algorithm based on linear partitioning of the pixel intensity set and between-cluster variance criterion function for multilevel image segmentation. The results of testing the proposed algorithm on real images from Berkeley Segmentation Dataset and Benchmark show that the algorithm is comparable with state-of-the-art multilevel segmentation algorithms and consistently produces high quality results. The attractive properties of the algorithm are its simplicity, generalization to a large number of clusters, and computational cost effectiveness.
A High-Order CFS Algorithm for Clustering Big Data
Directory of Open Access Journals (Sweden)
Fanyu Bu
2016-01-01
Full Text Available With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learning model, whose functionality rests on three pillars: (i an adaptive dropout deep learning model to learn features from each type of data, (ii a feature tensor model to capture the correlations of heterogeneous data, and (iii a tensor distance-based high-order CFS algorithm to cluster heterogeneous data. Furthermore, we verify our proposed algorithm on different datasets, by comparison with other two clustering schemes, that is, HOPCM and CFS. Results confirm the effectiveness of the proposed algorithm in clustering heterogeneous data.
Meaningful Clustered Forest: an Automatic and Robust Clustering Algorithm
Tepper, Mariano; Almansa, Andrés
2011-01-01
We propose a new clustering method that can be regarded as a numerical method to compute the proximity gestalt. The method analyzes edge length statistics in the MST of the dataset and provides an a contrario cluster detection criterion. The approach is fully parametric on the chosen distance and can detect arbitrarily shaped clusters. The method is also automatic, in the sense that only a single parameter is left to the user. This parameter has an intuitive interpretation as it controls the expected number of false detections. We show that the iterative application of our method can (1) provide robustness to noise and (2) solve a masking phenomenon in which a highly populated and salient cluster dominates the scene and inhibits the detection of less-populated, but still salient, clusters.
Hierarchical Route Optimization By Using Memetic Algorithm In A Mobile Networks
Directory of Open Access Journals (Sweden)
K .K. Gautam
2011-02-01
Full Text Available The networks Mobility (NEMO Protocol is a way of managing the mobility of an entire network, and mobile internet protocol is the basic solution for networks Mobility. A hierarchical route optimization system for mobile network is proposed to solve management of hierarchical route optimization problems. In present paper we study hierarchical Route Optimization scheme using memetic algorithm(HROSMA The concept of optimization- finding the extreme of a function that maps candidate ‘solution’ to scalar values of ‘quality’ – is an extremely general and useful idea. For solving this problem, we use a few salient adaptations, and we also extend HROSMA perform routing between the mobile networks.
van der Ham, Joris L
2016-05-19
Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days.
An energy efficient cooperative hierarchical MIMO clustering scheme for wireless sensor networks.
Nasim, Mehwish; Qaisar, Saad; Lee, Sungyoung
2012-01-01
In this work, we present an energy efficient hierarchical cooperative clustering scheme for wireless sensor networks. Communication cost is a crucial factor in depleting the energy of sensor nodes. In the proposed scheme, nodes cooperate to form clusters at each level of network hierarchy ensuring maximal coverage and minimal energy expenditure with relatively uniform distribution of load within the network. Performance is enhanced by cooperative multiple-input multiple-output (MIMO) communication ensuring energy efficiency for WSN deployments over large geographical areas. We test our scheme using TOSSIM and compare the proposed scheme with cooperative multiple-input multiple-output (CMIMO) clustering scheme and traditional multihop Single-Input-Single-Output (SISO) routing approach. Performance is evaluated on the basis of number of clusters, number of hops, energy consumption and network lifetime. Experimental results show significant energy conservation and increase in network lifetime as compared to existing schemes.
Nimon, Kim
2012-01-01
Using state achievement data that are openly accessible, this paper demonstrates the application of hierarchical linear modeling within the context of career technical education research. Three prominent approaches to analyzing clustered data (i.e., modeling aggregated data, modeling disaggregated data, modeling hierarchical data) are discussed…
Ning, P; Guo, Y F; Sun, T Y; Zhang, H S; Chai, D; Li, X M
2016-09-01
To study the distinct clinical phenotype of chronic airway diseases by hierarchical cluster analysis and two-step cluster analysis. A population sample of adult patients in Donghuamen community, Dongcheng district and Qinghe community, Haidian district, Beijing from April 2012 to January 2015, who had wheeze within the last 12 months, underwent detailed investigation, including a clinical questionnaire, pulmonary function tests, total serum IgE levels, blood eosinophil level and a peak flow diary. Nine variables were chosen as evaluating parameters, including pre-salbutamol forced expired volume in one second(FEV1)/forced vital capacity(FVC) ratio, pre-salbutamol FEV1, percentage of post-salbutamol change in FEV1, residual capacity, diffusing capacity of the lung for carbon monoxide/alveolar volume adjusted for haemoglobin level, peak expiratory flow(PEF) variability, serum IgE level, cumulative tobacco cigarette consumption (pack-years) and respiratory symptoms (cough and expectoration). Subjects' different clinical phenotype by hierarchical cluster analysis and two-step cluster analysis was identified. (1) Four clusters were identified by hierarchical cluster analysis. Cluster 1 was chronic bronchitis in smokers with normal pulmonary function. Cluster 2 was chronic bronchitis or mild chronic obstructive pulmonary disease (COPD) patients with mild airflow limitation. Cluster 3 included COPD patients with heavy smoking, poor quality of life and severe airflow limitation. Cluster 4 recognized atopic patients with mild airflow limitation, elevated serum IgE and clinical features of asthma. Significant differences were revealed regarding pre-salbutamol FEV1/FVC%, pre-salbutamol FEV1% pred, post-salbutamol change in FEV1%, maximal mid-expiratory flow curve(MMEF)% pred, carbon monoxide diffusing capacity per liter of alveolar(DLCO)/(VA)% pred, residual volume(RV)% pred, total serum IgE level, smoking history (pack-years), St.George's respiratory questionnaire
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Directory of Open Access Journals (Sweden)
Zakir Hussain Ahmed
2014-01-01
Full Text Available The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
Hierarchical Tree Algorithm for Collisional N-body Simulations on GRAPE
Fukushige, Toshiyuki
2016-01-01
We present an implementation of the hierarchical tree algorithm on the individual timestep algorithm (the Hermite scheme) for collisional $N$-body simulations, running on GRAPE-9 system, a special-purpose hardware accelerator for gravitational many-body simulations. Such combination of the tree algorithm and the individual timestep algorithm was not easy on the previous GRAPE system mainly because its memory addressing scheme was limited only to sequential access to a full set of particle data. The present GRAPE-9 system has an indirect memory addressing unit and a particle memory large enough to store all particles data and also tree nodes data. The indirect memory addressing unit stores interaction lists for the tree algorithm, which is constructed on host computer, and, according to the interaction lists, force pipelines calculate only the interactions necessary. In our implementation, the interaction calculations are significantly reduced compared to direct $N^2$ summation in the original Hermite scheme. ...
Odong, T L; van Heerwaarden, J; Jansen, J; van Hintum, T J L; van Eeuwijk, F A
2011-07-01
Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using real and simulated molecular marker data. Our study also compared the performance of traditional hierarchical clustering with model-based clustering (STRUCTURE). We showed that the cophenetic correlation coefficient is directly related to subgroup differentiation and can thus be used as an indicator of the presence of genetically distinct subgroups in germplasm collections. Whereas UPGMA performed well in preserving distances between accessions, Ward excelled in recovering groups. Our results also showed a close similarity between clusters obtained by Ward and by STRUCTURE. Traditional cluster analysis can provide an easy and effective way of determining structure in germplasm collections using molecular marker data, and, the output can be used for sampling core collections or for association studies.
The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
Institute of Scientific and Technical Information of China (English)
CHEN Jian-bin; DONG Xiang-jun; SONG Han-tao
2004-01-01
To construct a high efficient text clustering algorithm, the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed.The model is applied to text clustering.The performance of clustering algorithm has to be improved with the refinement algorithm application.The experiment result demonstrated that the multilevel graph text clustering algorithm is available.
MSClust: A Multi-Seeds Based Clustering Algorithm for microbiome profiling using 16S rRNA Sequence
Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu
2013-01-01
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16s rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776
Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm
Dong Qin
2014-01-01
Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstab...
An Integrated Metric Based Hierarchical Routing Algorithm in Broadband Communication System
Institute of Scientific and Technical Information of China (English)
SHI Chengge; HU Jiajun; Milton Chang
2001-01-01
We give an integrated metric basedhierarchical routing algorithm - FMRSF (FunctionFi(.) minimum routing selected first) algorithm inbroadband communication system in this paper. Withthe authors' analysis strategy, this paper gives a rout-ing solution for hierarchical communication system,and the solution is suited to both ATM network andIP network. Due to the highMevel logic network map-ping in a hierarchical communication system, a largecommunication network can be described as a moresimple logic network on a high level. But, it is dif-ficult to evaluate the QoS parameters of the relativefactors of a logic network (For example: the time de-lay and the bandwidth of logic nodes or logic links).We develop our strategy with FMRSF - algorithm fordifferent routing path, and select the reasonable pathfor one communication session. After designing an in-tegrated metric function describing the QoS metrics ofthe relative factors of a logic network on the high lev-els in a broadband communication system, we provethat the new routing algorithm - FMRSF algorithm ismore simple and applicable, compared with the globaloptimum algorithm.
The impact of hierarchical memory systems on linear algebra algorithm design
Energy Technology Data Exchange (ETDEWEB)
Gallivan, K.; Jalby, W.; Meier, U.; Sameh, A.
1987-09-14
Performing an extremely detailed performance optimization analysis is counterproductive when the concern is performance behavior on a class of architecture, since general trends are obscured by the overwhelming number of machine-specific considerations required. Instead, in this paper, a methodology is used which identifies the effects of a hierarchical memory system on the performance of linear algebra algorithms on multivector processors; provides a means of producing a set of algorithm parameters, i.e., blocksizes, as functions of system parameters which yield near-optimal performance; and provides guidelines for algorithm designers which reduce the influence of the hierarchical memory system on algorithm performance to negligible levels and thereby allow them to concentrate on machine-specific optimizations. The remainder of this paper comprises five major discussions. First, the methodology and the attendant architectural model are discussed. Next, an analysis of the basic BLAS3 matrix-matrix primitive is presented. This is followed by a discussion of three block algorithms: a block LU decomposition, a block LDL/sup T/ decomposition and a block Gram-Schmidt algorithm. 22 refs., 9 figs.
The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering
Directory of Open Access Journals (Sweden)
Khalid Jaber
2009-01-01
Full Text Available Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI library and POSIX threads (PThread to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.
A New Method for Medical Image Clustering Using Genetic Algorithm
Directory of Open Access Journals (Sweden)
Akbar Shahrzad Khashandarag
2013-01-01
Full Text Available Segmentation is applied in medical images when the brightness of the images becomes weaker so that making different in recognizing the tissues borders. Thus, the exact segmentation of medical images is an essential process in recognizing and curing an illness. Thus, it is obvious that the purpose of clustering in medical images is the recognition of damaged areas in tissues. Different techniques have been introduced for clustering in different fields such as engineering, medicine, data mining and so on. However, there is no standard technique of clustering to present ideal results for all of the imaging applications. In this paper, a new method combining genetic algorithm and k-means algorithm is presented for clustering medical images. In this combined technique, variable string length genetic algorithm (VGA is used for the determination of the optimal cluster centers. The proposed algorithm has been compared with the k-means clustering algorithm. The advantage of the proposed method is the accuracy in selecting the optimal cluster centers compared with the above mentioned technique.
Centronit: Initial Centroid Designation Algorithm for K-Means Clustering
Directory of Open Access Journals (Sweden)
Ali Ridho Barakbah
2014-06-01
Full Text Available Clustering performance of the K-means highly depends on the correctness of initial centroids. Usually initial centroids for the K- means clustering are determined randomly so that the determined initial centers may cause to reach the nearest local minima, not the global optimum. In this paper, we propose an algorithm, called as Centronit, for designation of initial centroidoptimization of K-means clustering. The proposed algorithm is based on the calculation of the average distance of the nearest data inside region of the minimum distance. The initial centroids can be designated by the lowest average distance of each data. The minimum distance is set by calculating the average distance between the data. This method is also robust from outliers of data. The experimental results show effectiveness of the proposed method to improve the clustering results with the K-means clustering. Keywords: K-means clustering, initial centroids, Kmeansoptimization.
New clustering algorithm for interconnection of MANET and internet
Institute of Scientific and Technical Information of China (English)
万象; 姚尹雄; 王豪行
2004-01-01
This paper presents core-agent based clustering (CBC) algorithm, a novel heuristic clustering scheme for interconnection of MANET and Internet using power, movement probability and hop length as constraints. CBC includes two phases as cluster initialization and cluster maintenance. In phase one, the selection of clusterheads obeys the first two constraints, whereas the father node of each clustering node is chosen according to above three ones. Phase two concerns the case of node insertion or removal. Easy access and little alteration of conventional mobile IP are some characters of this algorithm. Simulation results demonstrate that CBC has many advantages as less average hop length, good robustness and less overheads, and the clustered network architecture behaves stably when topology changes.
The Effective Clustering Partition Algorithm Based on the Genetic Evolution
Institute of Scientific and Technical Information of China (English)
LIAO Qin; LI Xi-wen
2006-01-01
To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.
An Extended Clustering Algorithm for Statistical Language Models
Ueberla, J P
1994-01-01
Statistical language models frequently suffer from a lack of training data. This problem can be alleviated by clustering, because it reduces the number of free parameters that need to be trained. However, clustered models have the following drawback: if there is ``enough'' data to train an unclustered model, then the clustered variant may perform worse. On currently used language modeling corpora, e.g. the Wall Street Journal corpus, how do the performances of a clustered and an unclustered model compare? While trying to address this question, we develop the following two ideas. First, to get a clustering algorithm with potentially high performance, an existing algorithm is extended to deal with higher order N-grams. Second, to make it possible to cluster large amounts of training data more efficiently, a heuristic to speed up the algorithm is presented. The resulting clustering algorithm can be used to cluster trigrams on the Wall Street Journal corpus and the language models it produces can compete with exi...
Directory of Open Access Journals (Sweden)
Natalia A Petushkova
Full Text Available There are two ways that statistical methods can learn from biomedical data. One way is to learn classifiers to identify diseases and to predict outcomes using the training dataset with established diagnosis for each sample. When the training dataset is not available the task can be to mine for presence of meaningful groups (clusters of samples and to explore underlying data structure (unsupervised learning.We investigated the proteomic profiles of the cytosolic fraction of human liver samples using two-dimensional electrophoresis (2DE. Samples were resected upon surgical treatment of hepatic metastases in colorectal cancer. Unsupervised hierarchical clustering of 2DE gel images (n = 18 revealed a pair of clusters, containing 11 and 7 samples. Previously we used the same specimens to measure biochemical profiles based on cytochrome P450-dependent enzymatic activities and also found that samples were clearly divided into two well-separated groups by cluster analysis. It turned out that groups by enzyme activity almost perfectly match to the groups identified from proteomic data. Of the 271 reproducible spots on our 2DE gels, we selected 15 to distinguish the human liver cytosolic clusters. Using MALDI-TOF peptide mass fingerprinting, we identified 12 proteins for the selected spots, including known cancer-associated species.Our results highlight the importance of hierarchical cluster analysis of proteomic data, and showed concordance between results of biochemical and proteomic approaches. Grouping of the human liver samples and/or patients into differing clusters may provide insights into possible molecular mechanism of drug metabolism and creates a rationale for personalized treatment.
Evolutionary-Hierarchical Bases of the Formation of Cluster Model of Innovation Economic Development
Directory of Open Access Journals (Sweden)
Yuliya Vladimirovna Dubrovskaya
2016-10-01
Full Text Available The functioning of a modern economic system is based on the interaction of objects of different hierarchical levels. Thus, the problem of the study of innovation processes taking into account the mutual influence of the activities of these economic actors becomes important. The paper dwells evolutionary basis for the formation of models of innovation development on the basis of micro and macroeconomic analysis. Most of the concepts recognized that despite a big number of diverse models, the coordination of the relations between economic agents is of crucial importance for the successful innovation development. According to the results of the evolutionary-hierarchical analysis, the authors reveal key phases of the development of forms of business cooperation, science and government in the domestic economy. It has become the starting point of the conception of the characteristics of the interaction in the cluster models of innovation development of the economy. Considerable expectancies on improvement of the national innovative system are connected with the development of cluster and network structures. The main objective of government authorities is the formation of mechanisms and institutions that will foster cooperation between members of the clusters. The article explains that the clusters cannot become the factors in the growth of the national economy, not being an effective tool for interaction between the actors of the regional innovative systems.
Energy Technology Data Exchange (ETDEWEB)
Makeechev, V.A. [Industrial Power Company, Krasnopresnenskaya Naberejnaya 12, 123610 Moscow (Russian Federation); Soukhanov, O.A. [Energy Systems Institute, 1 st Yamskogo Polya Street 15, 125040 Moscow (Russian Federation); Sharov, Y.V. [Moscow Power Engineering Institute, Krasnokazarmennaya Street 14, 111250 Moscow (Russian Federation)
2008-07-15
This paper presents foundations of the optimization method intended for solution of power systems operation problems and based on the principles of functional modeling (FM). This paper also presents several types of hierarchical FM algorithms for economic dispatch in these systems derived from this method. According to the FM method a power system is represented by hierarchical model consisting of systems of equations of lower (subsystem) levels and higher level system of connection equations (SCE), in which only boundary variables of subsystems are present. Solution of optimization problem in accordance with the FM method consists of the following operations: (1) solution of optimization problem for each subsystem (values of boundary variables for subsystems should be determined on the higher level of model); (2) calculation of functional characteristic (FC) of each subsystem, pertaining to state of subsystem on current iteration (these two steps are carried out on the lower level of the model); (3) formation and solution of the higher level system of equations (SCE), which gives values of boundary and supplementary boundary variables on current iteration. The key elements in the general structure of the FM method are FCs of subsystems, which represent them on the higher level of the model as ''black boxes''. Important advantage of hierarchical FM algorithms is that results obtained with them on each iteration are identical to those of corresponding basic one level algorithms. (author)
Critical dynamics of cluster algorithms in the dilute Ising model
Hennecke, M.; Heyken, U.
1993-08-01
Autocorrelation times for thermodynamic quantities at T C are calculated from Monte Carlo simulations of the site-diluted simple cubic Ising model, using the Swendsen-Wang and Wolff cluster algorithms. Our results show that for these algorithms the autocorrelation times decrease when reducing the concentration of magnetic sites from 100% down to 40%. This is of crucial importance when estimating static properties of the model, since the variances of these estimators increase with autocorrelation time. The dynamical critical exponents are calculated for both algorithms, observing pronounced finite-size effects in the energy autocorrelation data for the algorithm of Wolff. We conclude that, when applied to the dilute Ising model, cluster algorithms become even more effective than local algorithms, for which increasing autocorrelation times are expected.
Segmentation of Medical Image using Clustering and Watershed Algorithms
M. C.J. Christ; R.M.S Parvathi
2011-01-01
Problem statement: Segmentation plays an important role in medical imaging. Segmentation of an image is the division or separation of the image into dissimilar regions of similar attribute. In this study we proposed a methodology that integrates clustering algorithm and marker controlled watershed segmentation algorithm for medical image segmentation. The use of the conservative watershed algorithm for medical image analysis is pervasive because of its advantages, such as always being able to...
Directory of Open Access Journals (Sweden)
L. Infante
2002-01-01
Full Text Available En esta contribuci on presento resultados recientes sobre las propiedades de acumulaci on de galaxias, grupos, c umulos y superc umulos de bajo redshift (z 1. Presento, a su vez, lo esperado y lo medido con respecto al grado de evoluci on de la acumulaci on de galaxias. Hemos usado el cat alogo fotom etrico de galaxias extra do de las primeras im agenes del \\Sloan Digital Sky Survey", para estudiar las propiedades de acumulaci on de peque~nas estructuras de galaxias, pares, tr os, cuartetos, quintetos, etc. Un an alisis de la funci on de correlaci on de dos puntos, en un area de 250 grados cuadrados del cielo, muestra que estos objetos, al parecer, est an mucho m as acumulados que galaxias individuales.
Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2006-01-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.
Efficient cluster algorithm for CP(N-1) models
Beard, B. B.; Pepe, M.; Riederer, S.; Wiese, U.-J.
2006-11-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z=0.
A novel approach to the problem of non-uniqueness of the solution in hierarchical clustering.
Cattinelli, Isabella; Valentini, Giorgio; Paulesu, Eraldo; Borghese, Nunzio Alberto
2013-07-01
The existence of multiple solutions in clustering, and in hierarchical clustering in particular, is often ignored in practical applications. However, this is a non-trivial problem, as different data orderings can result in different cluster sets that, in turns, may lead to different interpretations of the same data. The method presented here offers a solution to this issue. It is based on the definition of an equivalence relation over dendrograms that allows developing all and only the significantly different dendrograms for the same dataset, thus reducing the computational complexity to polynomial from the exponential obtained when all possible dendrograms are considered. Experimental results in the neuroimaging and bioinformatics domains show the effectiveness of the proposed method.
Using Dynamic Quantum Clustering to Analyze Hierarchically Heterogeneous Samples on the Nanoscale
Energy Technology Data Exchange (ETDEWEB)
Hume, Allison; /Princeton U. /SLAC
2012-09-07
Dynamic Quantum Clustering (DQC) is an unsupervised, high visual data mining technique. DQC was tested as an analysis method for X-ray Absorption Near Edge Structure (XANES) data from the Transmission X-ray Microscopy (TXM) group. The TXM group images hierarchically heterogeneous materials with nanoscale resolution and large field of view. XANES data consists of energy spectra for each pixel of an image. It was determined that DQC successfully identifies structure in data of this type without prior knowledge of the components in the sample. Clusters and sub-clusters clearly reflected features of the spectra that identified chemical component, chemical environment, and density in the image. DQC can also be used in conjunction with the established data analysis technique, which does require knowledge of components present.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
A dynamic fuzzy clustering method based on genetic algorithm
Institute of Scientific and Technical Information of China (English)
ZHENG Yan; ZHOU Chunguang; LIANG Yanchun; GUO Dongwei
2003-01-01
A dynamic fuzzy clustering method is presented based on the genetic algorithm. By calculating the fuzzy dissimilarity between samples the essential associations among samples are modeled factually. The fuzzy dissimilarity between two samples is mapped into their Euclidean distance, that is, the high dimensional samples are mapped into the two-dimensional plane. The mapping is optimized globally by the genetic algorithm, which adjusts the coordinates of each sample, and thus the Euclidean distance, to approximate to the fuzzy dissimilarity between samples gradually. A key advantage of the proposed method is that the clustering is independent of the space distribution of input samples, which improves the flexibility and visualization. This method possesses characteristics of a faster convergence rate and more exact clustering than some typical clustering algorithms. Simulated experiments show the feasibility and availability of the proposed method.
SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
Directory of Open Access Journals (Sweden)
S. Anitha Elavarasi
2014-01-01
Full Text Available Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification or unsupervised learning (e.g. Clustering Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer, two different similarity measure (DISC and Overlap and DILCA applied for hierarchy and partition algorithm are evaluated.
A Geometric Clustering Algorithm with Applications to Structural Data
Xu, Shutan; Zou, Shuxue
2015-01-01
Abstract An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Research on retailer data clustering algorithm based on Spark
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Big Data Clustering Using Genetic Algorithm On Hadoop Mapreduce
Directory of Open Access Journals (Sweden)
Nivranshu Hans
2015-04-01
Full Text Available Abstract Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Mapreduce is a popular programming model which enables parallel processing in a distributed environment. But most of the clustering algorithms are not naturally parallelizable for instance Genetic Algorithms. This is so due to the sequential nature of Genetic Algorithms. This paper introduces a technique to parallelize GA based clustering by extending hadoop mapreduce. An analysis of proposed approach to evaluate performance gains with respect to a sequential algorithm is presented. The analysis is based on a real life large data set.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun
2015-01-01
Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.
An improved algorithm for clustering gene expression data.
Bandyopadhyay, Sanghamitra; Mukhopadhyay, Anirban; Maulik, Ujjwal
2007-11-01
Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Improved insensitive to input parameters trajectory clustering algorithm
Institute of Scientific and Technical Information of China (English)
Jiashun Chen; Dechang Pi
2013-01-01
The existing trajectory clustering (TRACLUS) is sensi-tive to the input parameters ε and MinLns. The parameter value is changed a little, but cluster results are entirely different. Aiming at this vulnerability, a shielding parameters sensitivity trajectory cluster (SPSTC) algorithm is proposed which is insensitive to the input parameters. Firstly, some definitions about the core distance and reachable distance of line segment are presented, and then the algorithm generates cluster sorting according to the core dis-tance and reachable distance. Secondly, the reachable plots of line segment sets are constructed according to the cluster sor-ting and reachable distance. Thirdly, a parameterized sequence is extracted according to the reachable plot, and then the final trajec-tory cluster based on the parameterized sequence is acquired. The parameterized sequence represents the inner cluster structure of trajectory data. Experiments on real data sets and test data sets show that the SPSTC algorithm effectively reduces the sensitivity to the input parameters, meanwhile it can obtain the better quality of the trajectory cluster.
Morphology of Open Clusters NGC 1857 and Czernik 20 using Clustering Algorithms
Bhattacharya, Souradeep; Pandaokar, Samay; Singh, Parikshit Kishor
2016-01-01
The morphology and cluster membership of the Galactic open clusters - Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Emmons, Scott; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...
Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.
Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si
2017-07-01
Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Energy Technology Data Exchange (ETDEWEB)
Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
GDCluster: A General Decentralized Clustering Algorithm
Mashayekhi, Hoda; Habibi, Jafar; Khalafbeigi, Tania; Voulgaris, Spyros; van Steen, Martinus Richardus
In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index.
Ghebremedhin, Meron; Yesupriya, Shubha; Luka, Janos; Crane, Nicole J.
2015-03-01
Recent studies have demonstrated the potential advantages of the use of Raman spectroscopy in the biomedical field due to its rapidity and noninvasive nature. In this study, Raman spectroscopy is applied as a method for differentiating between bacteria isolates for Gram status and Genus species. We created models for identifying 28 bacterial isolates using spectra collected with a 785 nm laser excitation Raman spectroscopic system. In order to investigate the groupings of these samples, partial least squares discriminant analysis (PLSDA) and hierarchical cluster analysis (HCA) was implemented. In addition, cluster analyses of the isolates were performed using various data types consisting of, biochemical tests, gene sequence alignment, high resolution melt (HRM) analysis and antimicrobial susceptibility tests of minimum inhibitory concentration (MIC) and degree of antimicrobial resistance (SIR). In order to evaluate the ability of these models to correctly classify bacterial isolates using solely Raman spectroscopic data, a set of 14 validation samples were tested using the PLSDA models and consequently the HCA models. External cluster evaluation criteria of purity and Rand index were calculated at different taxonomic levels to compare the performance of clustering using Raman spectra as well as the other datasets. Results showed that Raman spectra performed comparably, and in some cases better than, the other data types with Rand index and purity values up to 0.933 and 0.947, respectively. This study clearly demonstrates that the discrimination of bacterial species using Raman spectroscopic data and hierarchical cluster analysis is possible and has the potential to be a powerful point-of-care tool in clinical settings.
Robustness of the ATLAS pixel clustering neural network algorithm
AUTHOR|(INSPIRE)INSPIRE-00407780; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. In the ATLAS track reconstruction algorithm, an artificial neural network is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The robustness of the neural network algorithm is presented, probing its sensitivity to uncertainties in the detector conditions. The robustness is studied by evaluating the stability of the algorithm's performance under a range of variations in the inputs to the neural networks. Within reasonable variation magnitudes, the neural networks prove to be robust to most variation types.
Diversity of Xiphinema americanum-group Species and Hierarchical Cluster Analysis of Morphometrics.
Lamberti, F; Ciancio, A
1993-09-01
Of the 39 species composing the Xiphinema americanum group, 14 were described originally from North America and two others have been reported from this region. Many species are very similar morphologically and can be distinguished only by a difficult comparison of various combinations of some morphometric characters. Study of morphometrics of 49 populations, including the type populations of the 39 species attributed to this group, by principal component analysis and hierarchical cluster analysis placed the populations into five subgroups, proposed here as the X. brevicolle subgroup (seven species), the X. americanum subgroup (17 species), the X. taylori subgroup (two species), the X. pachtaicum subgroup (eight species), and the X. lambertii subgroup (five species).
Capozziello, S; De Siena, S; Guerra, F; Illuminati, F
2000-01-01
We derive, in order of magnitude, the observed astrophysical and cosmologicalscales in the Universe, from neutron stars to superclusters of galaxies, up to,asymptotically, the observed radius of the Universe. This result is obtained byintroducing a recursive scheme of alternating hierachical mechanisms ofthree-dimensional and two-dimensional close packings of gravitationallyinteracting objects. The iterative scheme yields a rapidly converging geometricsequence, which can be described as a hierarchical clustering of aggregates,having the observed radius of the Universe as its fixed point.
AN OPTIMUM VEHICULAR PATH ALGORITHM FOR TRAFFIC NETWORK BASED ON HIERARCHICAL SPATIAL REASONING
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
Human beings' intellection is the characteristic of a distinct hierarchy and can be taken to construct a heuristic in the shortest path algorithms.It is detailed in this paper how to utilize the hierarchical reasoning on the basis of greedy and directional strategy to establish a spatial heuristic,so as to improve running efficiency and suitability of shortest path algorithm for traffic network.The authors divide urban traffic network into three hierarchies and set forward a new node hierarchy division rule to avoid the unreliable solution of shortest path.It is argued that the shortest path,no matter distance shortest or time shortest,is usually not the favorite of drivers in practice.Some factors difficult to expect or quantify influence the drivers' choice greatly.It makes the drivers prefer choosing a less shortest,but more reliable or flexible path to travel on.The presented optimum path algorithm,in addition to the improvement of the running efficiency of shortest path algorithms up to several times,reduces the emergence of those factors,conforms to the intellection characteristic of human beings,and is more easily accepted by drivers.Moreover,it does not require the completeness of networks in the lowest hierarchy and the applicability and fault tolerance of the algorithm have improved.The experiment result shows the advantages of the presented algorithm.The authors argued that the algorithm has great potential application for navigation systems of large-scale traffic networks.
Le, Thanh; Altman, Tom; Gardiner, Katheleen
2010-02-01
Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Source code and test datasets are available for download at http://ouray.cudenver.edu/~tnle/, implemented in C++ and supported on Linux and MS Windows.
World Wide Web Metasearch Clustering Algorithm
Directory of Open Access Journals (Sweden)
Adina LIPAI
2008-01-01
Full Text Available As the storage capacity and the processing speed of search engine is growing to keep up with the constant expansion of the World Wide Web, the user is facing an increasing list of results for a given query. A simple query composed of common words sometimes have hundreds even thousands of results making it practically impossible for the user to verify all of them, in order to identify a particular site. Even when the list of results is presented to the user ordered by a rank, most of the time it is not sufficient support to help him identify the most relevant sites for his query. The concept of search result clustering was introduced as a solution to this situation. The process of clustering search results consists of building up thematically homogenous groups from the initial list results provided by classic search tools, and using up characteristics present within the initial results, without any kind of predefined categories.
Institute of Scientific and Technical Information of China (English)
HOU XueLiang; LU Mei
2008-01-01
In order to seek the co-adaptability solution to conflict events in construction en-gineering projects,a new method referred to as segmented hierarchical algorithm is proposed in this paper by means of comparing co-adaptability evolution process of conflict events to the stackelberg model.By this new algorithm,local solutions to the first-order transformation of co-adaptability for conflict events can be ob-tained,based upon which,a global solution to the second-order transformation of co-adaptability for conflict events can also be decided by judging satisfaction de-gree of local solutions.The research results show that this algorithm can be used not only for obtaining co-adaptability solution to conflict events efficiently,but also for other general decision-making problems with multi-layers and multi-subsidi-aries in project management field.
Institute of Scientific and Technical Information of China (English)
2008-01-01
In order to seek the co-adaptability solution to conflict events in construction engineering projects, a new method referred to as segmented hierarchical algorithm is proposed in this paper by means of comparing co-adaptability evolution process of conflict events to the stackelberg model. By this new algorithm, local solutions to the first-order transformation of co-adaptability for conflict events can be obtained, based upon which, a global solution to the second-order transformation of co-adaptability for conflict events can also be decided by judging satisfaction degree of local solutions. The research results show that this algorithm can be used not only for obtaining co-adaptability solution to conflict events efficiently, but also for other general decision-making problems with multi-layers and multi-subsidi-aries in project management field.
Class hierarchical test case generation algorithm based on expanded EMDPN model
Institute of Scientific and Technical Information of China (English)
LI Jun-yi; GONG Hong-fang; HU Ji-ping; ZOU Bei-ji; SUN Jia-guang
2006-01-01
A new model of event and message driven Petri network(EMDPN) based on the characteristic of class interaction for messages passing between two objects was extended. Using EMDPN interaction graph, a class hierarchical test-case generation algorithm with cooperated paths (copaths) was proposed, which can be used to solve the problems resulting from the class inheritance mechanism encountered in object-oriented software testing such as oracle, message transfer errors, and unreachable statement. Finally, the testing sufficiency was analyzed with the ordered sequence testing criterion(OSC). The results indicate that the test cases stemmed from newly proposed automatic algorithm of copaths generation satisfies synchronization message sequences testing criteria, therefore the proposed new algorithm of copaths generation has a good coverage rate.
Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm
Directory of Open Access Journals (Sweden)
M. Manikantan
2015-02-01
Full Text Available Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining process and a common technique for grouping the web search results into a different category based on the specific web contents. A clustering search engine called Lingo used only snippets to cluster the documents. Though this method takes less time to cluster the documents, it could not be able to produce the clusters of good quality. This study focuses on clustering all documents using by applying semantic similarity between words and then by applying modified lingo algorithm in less time and produce good quality.
A Novel Hybrid Data Clustering Algorithm Based on Artificial Bee Colony Algorithm and K-Means
Institute of Scientific and Technical Information of China (English)
TRAN Dang Cong; WU Zhijian; WANG Zelin; DENG Changshou
2015-01-01
To improve the performance of K-means clustering algorithm, this paper presents a new hybrid ap-proach of Enhanced artificial bee colony algorithm and K-means (EABCK). In EABCK, the original artificial bee colony algorithm (called ABC) is enhanced by a new mu-tation operation and guided by the global best solution (called EABC). Then, the best solution is updated by K-means in each iteration for data clustering. In the experi-ments, a set of benchmark functions was used to evaluate the performance of EABC with other comparative ABC variants. To evaluate the performance of EABCK on data clustering, eleven benchmark datasets were utilized. The experimental results show that EABC and EABCK out-perform other comparative ABC variants and data clus-tering algorithms, respectively.
AN IMPROVED FUZZY CLUSTERING ALGORITHM FOR MICROARRAY IMAGE SPOTS SEGMENTATION
Directory of Open Access Journals (Sweden)
V.G. Biju
2015-11-01
Full Text Available An automatic cDNA microarray image processing using an improved fuzzy clustering algorithm is presented in this paper. The spot segmentation algorithm proposed uses the gridding technique developed by the authors earlier, for finding the co-ordinates of each spot in an image. Automatic cropping of spots from microarray image is done using these co-ordinates. The present paper proposes an improved fuzzy clustering algorithm Possibility fuzzy local information c means (PFLICM to segment the spot foreground (FG from background (BG. The PFLICM improves fuzzy local information c means (FLICM algorithm by incorporating typicality of a pixel along with gray level information and local spatial information. The performance of the algorithm is validated using a set of simulated cDNA microarray images added with different levels of AWGN noise. The strength of the algorithm is tested by computing the parameters such as the Segmentation matching factor (SMF, Probability of error (pe, Discrepancy distance (D and Normal mean square error (NMSE. SMF value obtained for PFLICM algorithm shows an improvement of 0.9 % and 0.7 % for high noise and low noise microarray images respectively compared to FLICM algorithm. The PFLICM algorithm is also applied on real microarray images and gene expression values are computed.
Hierarchical Agglomerative Clustering Schemes for Energy-Efficiency in Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Taleb Tariq
2017-06-01
Full Text Available Extending the lifetime of wireless sensor networks (WSNs while delivering the expected level of service remains a hot research topic. Clustering has been identified in the literature as one of the primary means to save communication energy. In this paper, we argue that hierarchical agglomerative clustering (HAC provides a suitable foundation for designing highly energy efficient communication protocols for WSNs. To this end, we study a new mechanism for selecting cluster heads (CHs based both on the physical location of the sensors and their residual energy. Furthermore, we study different patterns of communications between the CHs and the base station depending on the possible transmission ranges and the ability of the sensors to act as traffic relays. Simulation results show that our proposed clustering and communication schemes outperform well-knows existing approaches by comfortable margins. In particular, networks lifetime is increased by more than 60% compared to LEACH and HEED, and by more than 30% compared to K-means clustering.
Functional clustering algorithm for the analysis of dynamic network data
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Application of genetic algorithms to hydrogenated silicon clusters
Indian Academy of Sciences (India)
N Chakraborti; R Prasad
2003-01-01
We discuss the application of biologically inspired genetic algorithms to determine the ground state structures of a number of Si–H clusters. The total energy of a given configuration of a cluster has been obtained by using a non-orthogonal tight-binding model and the energy minimization has been carried out by using genetic algorithms and their recent variant differential evolution. Our results for ground state structures and cohesive energies for Si–H clusters are in good agreement with the earlier work conducted using the simulated annealing technique. We find that the results obtained by genetic algorithms turn out to be comparable and often better than the results obtained by the simulated annealing technique.
Spin chain simulations with a meron cluster algorithm
Energy Technology Data Exchange (ETDEWEB)
Boyer, T. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Ecole Normale Superieure de Cachan (France); Bietenholz, W. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany). John von Neumann-Inst. fuer Computing NIC; Wuilloud, J. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Geneve Univ. (Switzerland). Dept. de Physique Theorique
2007-01-15
We apply a meron cluster algorithm to the XY spin chain, which describes a quantum rotor. This is a multi-cluster simulation supplemented by an improved estimator, which deals with objects of half-integer topological charge. This method is powerful enough to provide precise results for the model with a {theta}-term - it is therefore one of the rare examples, where a system with a complex action can be solved numerically. In particular we measure the correlation length, as well as the topological and magnetic susceptibility. We discuss the algorithmic efficiency in view of the critical slowing down. Due to the excellent performance that we observe, it is strongly motivated to work on new applications of meron cluster algorithms in higher dimensions. (orig.)
Adaptive Weighted Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Adwan Yasin
2016-04-01
Full Text Available In this paper we present a new algorithm for clustering MANET by considering several parameters. This is a new adaptive load balancing technique for clustering out Mobile Ad-hoc Networks (MANET. MANET is special kind of wireless networks where no central management exits and the nodes in the network cooperatively manage itself and maintains connectivity. The algorithm takes into account the local capabilities of each node, the remaining battery power, degree of connectivity and finally the power consumption based on the average distance between nodes and candidate cluster head. The proposed algorithm efficiently decreases the overhead in the network that enhances the overall MANET performance. Reducing the maintenance time of broken routes makes the network more stable, reliable. Saving the power of the nodes also guarantee consistent and reliable network.
Directory of Open Access Journals (Sweden)
I. Crawford
2015-07-01
Full Text Available In this paper we present improved methods for discriminating and quantifying Primary Biological Aerosol Particles (PBAP by applying hierarchical agglomerative cluster analysis to multi-parameter ultra violet-light induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1×106 points on a desktop computer, allowing for each fluorescent particle in a dataset to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient dataset. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best performing methods were applied to the BEACHON-RoMBAS ambient dataset where it was found that the z-score and range normalisation methods yield similar results with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the underestimation of bacterial aerosol concentration by a factor of 5. We suggest that this likely due to errors arising from misatrribution
Directory of Open Access Journals (Sweden)
I. Crawford
2015-11-01
Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the
Density-based cluster algorithms for the identification of core sets
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
Energy Efficient Homogenous Clustering and Cluster Head Selection Algorithm for WSN
Directory of Open Access Journals (Sweden)
Ganeshayya I. Shidaganti
2013-02-01
Full Text Available Wireless sensor networks (WSNs are energy and resource constrained networks, which are made up of small electronic devices called sensor nodes. Each sensor nodes are capable of sensing, computing and transmitting data from one node to another, till to reach base station. Each node monitors physical or environmental conditions, depending on application and communicate with nearby nodes via radio broadcast. Radio transmission and reception consumes a lot of energy in a wireless sensor network (WSN, thus, one of the important issues in wireless sensor network is the inherent limited battery power within the sensor nodes. Therefore, battery power is crucial parameter in the algorithm design in maximizing the lifespan of sensor nodes. Much research has been done in recent years in the area of low power routing protocol, but there are still many design options open for improvement and for further research targeted to the specific applications need to be done. In this paper, we propose a new approach of an energy-efficient homogeneous clustering and cluster head selection algorithm for wireless sensor networks in which the lifespan of the network is increased by ensuring a homogeneous distribution of nodes in the clusters. In this clustering algorithm, energy efficiency is distributed and network performance is improved by selecting cluster heads on the basis of the residual energy of existing cluster heads, holdback value, and nearest hop distance of the node. In the proposed clustering algorithm, the cluster members are uniformly distributed and the life of the network is further extended
Masiero, Joseph R; Bauer, J M; Grav, T; Nugent, C R; Stevenson, R
2013-01-01
Using albedos from WISE/NEOWISE to separate distinct albedo groups within the Main Belt asteroids, we apply the Hierarchical Clustering Method to these subpopulations and identify dynamically associated clusters of asteroids. While this survey is limited to the ~35% of known Main Belt asteroids that were detected by NEOWISE, we present the families linked from these objects as higher confidence associations than can be obtained from dynamical linking alone. We find that over one-third of the observed population of the Main Belt is represented in the high-confidence cores of dynamical families. The albedo distribution of family members differs significantly from the albedo distribution of background objects in the same region of the Main Belt, however interpretation of this effect is complicated by the incomplete identification of lower-confidence family members. In total we link 38,298 asteroids into 76 distinct families. This work represents a critical step necessary to debias the albedo and size distributio...
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704
NCUBE - A clustering algorithm based on a discretized data space
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
A Rough Set based Gene Expression Clustering Algorithm
Directory of Open Access Journals (Sweden)
J. J. Emilyn
2011-01-01
Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.
Core Business Selection Based on Ant Colony Clustering Algorithm
Directory of Open Access Journals (Sweden)
Yu Lan
2014-01-01
Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.
Research on Scheduling Algorithms in Web Cluster Servers
Institute of Scientific and Technical Information of China (English)
LEI YingChun (雷迎春); GONG YiLi (龚奕利); ZHANG Song (张松); LI GuoJie (李国杰)
2003-01-01
This paper analyzes quantitatively the impact of the load balance scheduling algorithms and the locality scheduling algorithms on the performance of Web cluster servers, and brings forward the Adaptive_LARD algorithm. Compared with the representative LARD algorithm, the advantages of the Adaptive_LARD are that: (1) it adjusts load distribution among the back-ends through the idea of load balancing to avoid learning steps in the LARD algorithm and reinforce its adaptability; (2) by distinguishing between TCP connections accessing disks and those accessing cache memory, it can estimate the impact of different connections on the back-ends' load more precisely. Performance evaluations suggest that the proposed method outperforms the LARD algorithm by up to 14.7%.
Fernández-Arjona, María del Mar; Grondona, Jesús M.; Granados-Durán, Pablo; Fernández-Llebrez, Pedro; López-Ávalos, María D.
2017-01-01
It is known that microglia morphology and function are closely related, but only few studies have objectively described different morphological subtypes. To address this issue, morphological parameters of microglial cells were analyzed in a rat model of aseptic neuroinflammation. After the injection of a single dose of the enzyme neuraminidase (NA) within the lateral ventricle (LV) an acute inflammatory process occurs. Sections from NA-injected animals and sham controls were immunolabeled with the microglial marker IBA1, which highlights ramifications and features of the cell shape. Using images obtained by section scanning, individual microglial cells were sampled from various regions (septofimbrial nucleus, hippocampus and hypothalamus) at different times post-injection (2, 4 and 12 h). Each cell yielded a set of 15 morphological parameters by means of image analysis software. Five initial parameters (including fractal measures) were statistically different in cells from NA-injected rats (most of them IL-1β positive, i.e., M1-state) compared to those from control animals (none of them IL-1β positive, i.e., surveillant state). However, additional multimodal parameters were revealed more suitable for hierarchical cluster analysis (HCA). This method pointed out the classification of microglia population in four clusters. Furthermore, a linear discriminant analysis (LDA) suggested three specific parameters to objectively classify any microglia by a decision tree. In addition, a principal components analysis (PCA) revealed two extra valuable variables that allowed to further classifying microglia in a total of eight sub-clusters or types. The spatio-temporal distribution of these different morphotypes in our rat inflammation model allowed to relate specific morphotypes with microglial activation status and brain location. An objective method for microglia classification based on morphological parameters is proposed. Main points Microglia undergo a quantifiable
Identifying multiple influential spreaders by a heuristic clustering algorithm
Energy Technology Data Exchange (ETDEWEB)
Bao, Zhong-Kui [School of Mathematical Science, Anhui University, Hefei 230601 (China); Liu, Jian-Guo [Data Science and Cloud Service Research Center, Shanghai University of Finance and Economics, Shanghai, 200133 (China); Zhang, Hai-Feng, E-mail: haifengzhang1978@gmail.com [School of Mathematical Science, Anhui University, Hefei 230601 (China); Department of Communication Engineering, North University of China, Taiyuan, Shan' xi 030051 (China)
2017-03-18
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.
Limited Random Walk Algorithm for Big Graph Data Clustering
Zhang, Honglei; Kiranyaz, Serkan; Gabbouj, Moncef
2016-01-01
Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attracting vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrass...
A Genetic Clustering Algorithm for Mean-Residual Vector Quantization
Institute of Scientific and Technical Information of China (English)
CHUShuchuan; JohnF.Roddick; CHENTsongyi
2004-01-01
Vector quantization (VQ) is a useful tool for data compression and can be applied to compress the data vectors in the database. The quality of the recovered data vector depends on a good codebook. Meanresidual vector quantization (M/R VQ) has been shown to be efficient in the encoding time and it only needs a little storage. In this paper, genetic algorithms in combination with the Generalized lloyd algorithm (GLA) are applied to the codebook design of M/R VQ. The mean codebook and residual codebook are trained using GLA algorithm separately, then Genetic algorithms (GA) are used to evaluate and evolve the combined mean codebook and residual codebook. The parameters used in the proposed algorithm are designed based on experiments and they are robust to the proposed GA based clustering algorithm for M/R VQ. Experimental results demonstrate the proposed genetic clustering algorithm applied to M/R VQ may improve the peak signal to noise ratio of the recovered data vector compared with the GLA algorithm.
Enhancing the Color Set Partitioning in Hierarchical Tree (SPIHT Algorithm Using Correlation Theory
Directory of Open Access Journals (Sweden)
a a a
2011-01-01
Full Text Available Problem statement: Efficient color image compression algorithm is essential for mass storage and the transmission of the image. The compression efficiency of the Set Partitioning in Hierarchical Tree (SPIHT coding algorithm for color images is improved by using correlation theory. Approach: In this study the correlation between the color channels are used to propose the new algorithm. The correlation between the color channels are analyzed in various color spaces and the color space CIE-UVW in which the color channels are highly correlated is taken. The most correlated U channel is considered as base color and compressed by using the wavelet filter and the SPIHT algorithm. The linear approximation of the two of the color components (V and W based on the primary color component U is used to code subordinate color components. The image is divided into N*N blocks in each color channels. The linear approximation coefficients are calculated for each block of the subordinate colors V and W as functions of the base color. Only these coefficients of each block are coded and send to the receiver along with the SPIHT coding of the base color. Results: By using this algorithm, a significant (4 dB mean value Peak Signal to Noise Ratio (PSNR improvement is obtained compared to the traditional coding scheme for the same compression rate and reduces the coding and decoding time. Also the proposed compression algorithm reduces the complexity in coding and decoding algorithms. Conclusion: This algorithm allows the reduction of complexity for both coding and decoding of color images. It is concluded that a significant PSNR gain and visual quality improvement is obtained. It is found that in color image coding, this algorithm is superior to the traditional de-correlation based methods and reduces the coding and decoding time.
Sanyal, Soumya; Jain, Amit; Das, Sajal K.; Biswas, Rupak
2003-01-01
In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.
Falahati Marvast, Fatemeh; Arabalibeik, Hossein; Alipour, Fatemeh; Sheikhtaheri, Abbas; Nouri, Leila; Soozande, Mehdi; Yarmahmoodi, Masood
2016-01-01
Keratoconus is a progressive non-inflammatory disease of the cornea. Rigid gas permeable contact lenses (RGPs) are prescribed when the disease progresses. Contact lens fitting and assessment is very difficult in these patients and is a concern of ophthalmologists and optometrists. In this study, a hierarchical fuzzy system is used to capture the expertise of experienced ophthalmologists during the lens evaluation phase of prescription. The system is fine-tuned using genetic algorithms. Sensitivity, specificity and accuracy of the final system are 88.9%, 94.4% and 92.6% respectively.
Hierarchical Search Motion Estimation Algorithms for Real-time Video Coding
Institute of Scientific and Technical Information of China (English)
1998-01-01
Data fetching and memory management are two factors as important as computation complexity in Motion Estimation(ME) implementation. In this paper, a new Large-scale Sampling Hierarchical Search motion estimation algorithm(LSHS) is proposed. The LSHS is suitable for real-time video coding with low computational complexity, reduced data fetching and simple memory access. The experiment results indicate the average decoding PSNR with LSHS is only about 0.2dB lower than that with Full Search (FS) scheme.
A Task-parallel Clustering Algorithm for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
On the Formation of Cool, Non-Flowing Cores in Galaxy Clusters via Hierarchical Mergers
Burns, J O; Norman, M L; Bryan, G L
2003-01-01
We present a new model for the creation of cool cores in rich galaxy clusters within a LambdaCDM cosmological framework using the results from high spatial dynamic range, adaptive mesh hydro/N-body simulations. It is proposed that cores of cool gas first form in subclusters and these subclusters merge to create rich clusters with cool, central X-Ray excesses. The rich cool clusters do not possess ``cooling flows'' due to the presence of bulk velocities in the intracluster medium in excess of 1000 km/sec produced by on-going accretion of gas from supercluster filaments. This new model has several attractive features including the presence of substantial core substructure within the cool cores, and it predicts the appearance of cool bullets, cool fronts, and cool filaments all of which have been recently observed with X-Ray satellites. This hierarchical formation model is also consistent with the observation that cool cores in Abell clusters occur preferentially in dense supercluster environments. On the other ...
Dynamic Head Cluster Election Algorithm for Clustered Ad-Hoc Networks
Directory of Open Access Journals (Sweden)
Arwa Zabian
2008-01-01
Full Text Available In distributed system, the concept of clustering consists on dividing the geographical area covered by a set of nodes into small zones. In mobile network, the clustering mechanism varied due to the mobility of the nodes any time in any direction. That causes the partitioning of the network or the joining of nodes. Several existing centralized or globalized algorithm have been proposed for clustering technique, in a manner that no one node becomes isolated and no cluster becomes overloaded. A particular node called head cluster or leader is elected, has the role to organize the distribution of nodes in clusters. We propose a distributed clustering and leader election mechanism for Ad-Hoc mobile networks, in which the leader is a mobile node. Our results show that, in the case of leader mobility the time needed to elect a new leader is smaller than the time needed a significant topological change in the network is happens.
Directory of Open Access Journals (Sweden)
Reilly John J
2005-06-01
Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical
Sherrill, Delsey M; Moy, Marilyn L; Reilly, John J; Bonato, Paolo
2005-01-01
Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD) undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical clustering methods are relevant
Clustering of galaxies in a hierarchical universe - II. Evolution to high redshift
Kauffmann, Guinevere; Colberg, Jörg M.; Diaferio, Antonaldo; White, Simon D. M.
1999-08-01
In hierarchical cosmologies the evolution of galaxy clustering depends both on cosmological quantities such as Omega, Lambda and P(k), which determine how collapsed structures - dark matter haloes - form and evolve, and on the physical processes - cooling, star formation, radiative and hydrodynamic feedback - which drive the formation of galaxies within these merging haloes. In this paper we combine dissipationless cosmological N-body simulations and semi-analytic models of galaxy formation in order to study how these two aspects interact. We focus on the differences in clustering predicted for galaxies of differing luminosity, colour, morphology and star formation rate, and on what these differences can teach us about the galaxy formation process. We show that a `dip' in the amplitude of galaxy correlations between z=0 and z=1 can be an important diagnostic. Such a dip occurs in low-density CDM models, because structure forms early, and dark matter haloes of mass ~10^12M_solar, containing galaxies with luminosities ~L_*, are unbiased tracers of the dark matter over this redshift range; their clustering amplitude then evolves similarly to that of the dark matter. At higher redshifts, bright galaxies become strongly biased and the clustering amplitude increases again. In high density models, structure forms late, and bias evolves much more rapidly. As a result, the clustering amplitude of L_* galaxies remains constant from z=0 to z=1. The strength of these effects is sensitive to sample selection. The dip becomes weaker for galaxies with lower star formation rates, redder colours, higher luminosities and earlier morphological types. We explain why this is the case, and how it is related to the variation with redshift of the abundance and environment of the observed galaxies. We also show that the relative peculiar velocities of galaxies are biased low in our models, but that this effect is never very strong. Studies of clustering evolution as a function of galaxy
A New Enhanced Fast Handover Algorithm in Hierarchical Mobile IPv6 Network
Institute of Scientific and Technical Information of China (English)
XU Kai; JI Hong; YUE Guang-xin
2004-01-01
Hierarchical Mobile IPv6 (HMIPv6) can reduce the delay and the amount of signaling during handover compared with the basic mobile IPv6. However, the protocol still cannot meet the requirement for traffic that is delay sensitive, such as voice, especially in macro mobility handover. Duplicate address detection and the transmission time for the handover operation could cause high handover delay. This paper proposes a new mechanism to improve the fast handover algorithms efficiency in HMIPv6 network. And we present and analyze the performance testing for our proposal by comparing it with the traditional HMIPv6 fast handover algorithm. The results of simulation show that our scheme can reduce the handover delay much more than the traditional fast handover method for HMIPv6 network.
Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem
Davendra, Donald; Zelinka, Ivan; Senkerik, Roman
2009-08-01
An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.
Blockspin Scheme and Cluster Algorithm for Quantum Spin Systems
Ying, H P; Ying, He-Ping; Wiese, Uwe-Jens
1992-01-01
We present a numerical study using a cluster algorithm for the 1-d $S=1/2$ quantum Heisenberg models. The dynamical critical exponent for anti-ferromagnetic chains is $z=0.0(1)$ such that critical slowing down is eliminated.
Clustering algorithms for Stokes space modulation format recognition
DEFF Research Database (Denmark)
Boada, Ricard; Borkowski, Robert; Tafur Monroy, Idelfonso
2015-01-01
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Energy Technology Data Exchange (ETDEWEB)
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster
A Survey on Clustering Algorithms for Heterogeneous Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Vivek Katiyar
2011-01-01
Full Text Available Potential use of wireless sensor networks (WSNs can be seen in various fields like disaster management, battle field surveillance and border security surveillance since last few years. In such applications, a large number of sensor nodes are deployed, which are often unattended and work autonomously. Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. It can also increase network scalability. Sensor nodes are considered to be homogeneous since the researches in the field of WSNs have been evolved, but some nodes may be of different energy to prolong the lifetime of a WSN and its reliability. In this paper, we study the impact of heterogeneity of nodes to the performance of WSNs. This paper surveys different clustering algorithms for heterogeneous WSNs by classifying algorithms depending upon various clustering attributes.
A HYBRID HEURISTIC ALGORITHM FOR THE CLUSTERED TRAVELING SALESMAN PROBLEM
Directory of Open Access Journals (Sweden)
Mário Mestria
2016-04-01
Full Text Available ABSTRACT This paper proposes a hybrid heuristic algorithm, based on the metaheuristics Greedy Randomized Adaptive Search Procedure, Iterated Local Search and Variable Neighborhood Descent, to solve the Clustered Traveling Salesman Problem (CTSP. Hybrid Heuristic algorithm uses several variable neighborhood structures combining the intensification (using local search operators and diversification (constructive heuristic and perturbation routine. In the CTSP, the vertices are partitioned into clusters and all vertices of each cluster have to be visited contiguously. The CTSP is -hard since it includes the well-known Traveling Salesman Problem (TSP as a special case. Our hybrid heuristic is compared with three heuristics from the literature and an exact method. Computational experiments are reported for different classes of instances. Experimental results show that the proposed hybrid heuristic obtains competitive results within reasonable computational time.
Directory of Open Access Journals (Sweden)
M. Kalpana
2014-01-01
Full Text Available This research work proposes a mathematical model for the lifetime of wireless sensor networks (WSN. It also proposes an energy efficient routing algorithm for WSN called hierarchical energy tree based routing algorithm (HETRA based on hierarchical energy tree constructed using the available energy in each node. The energy efficiency is further augmented by reducing the packet drops using exponential congestion control algorithm (TCP/EXP. The algorithms are evaluated in WSNs interconnected to fixed network with seven distribution patterns, simulated in ns2 and compared with the existing algorithms based on the parameters such as number of data packets, throughput, network lifetime, and data packets average network lifetime product. Evaluation and simulation results show that the combination of HETRA and TCP/EXP maximizes longer network lifetime in all the patterns. The lifetime of the network with HETRA algorithm has increased approximately 3.2 times that of the network implemented with AODV.
Investigation on IMCP based clustering in LTE-M communication for smart metering applications
National Research Council Canada - National Science Library
Kartik Vishal Deshpande; A. Rajesh
2017-01-01
.... This paper investigates the proposed Improved M2M Clustering Process (IMCP) based clustering technique and it is compared with two well-known clustering algorithms, namely, Low Energy Adaptive Clustering Hierarchical (LEACH...
An Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2005-01-01
We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.
Morphology of open clusters NGC 1857 and Czernik 20 using clustering algorithms
Bhattacharya, S.; Mahulkar, V.; Pandaokar, S.; Singh, P. K.
2017-01-01
The morphology and cluster membership of the Galactic open clusters-Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of the cleaned CMDs to obtain reliable distance and reddening parameters for the clusters (Czernik 20: D = 2900 pc; E(J- K) = 0 . 33; NGC 1857: D = 2400 pc; E(J- K) =0.18-0.19). The isochrones were also used to convert the luminosity functions for the densest regions of Czernik 20 and NGC 1857 into mass function, to derive their slopes. Additionally, a previously unknown over-density consistent with that of a star cluster is identified in the region of analysis.
Evaluation of clustering algorithms for protein-protein interaction networks
Directory of Open Access Journals (Sweden)
van Helden Jacques
2006-11-01
Full Text Available Abstract Background Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism. In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies. High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL, Restricted Neighborhood Search Clustering (RNSC, Super Paramagnetic Clustering (SPC, and Molecular Complex Detection (MCODE. Results A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. Conclusion This
A heuristic approach to possibilistic clustering algorithms and applications
Viattchenin, Dmitri A
2013-01-01
The present book outlines a new approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy cluster and the possibilistic memberships are determined directly from the values of the pairwise similarity of objects. The proposed approach can be used for solving different classification problems. Here, some techniques that might be useful at this purpose are outlined, including a methodology for constructing a set of labeled objects for a semi-supervised clustering algorithm, a methodology for reducing analyzed attribute space dimensionality and a methods for asymmetric data processing. Moreover, a technique for constructing a subset of the most appropriate alternatives for a set of weak fuzzy preference relations, which are defined on a universe of alternatives, is described in detail, and a method for rapidly prototyping the Mamdani’s fuzzy inference systems is introduced. This book addresses engineers, scientist...
A comparison of clustering algorithms in article recommendation system
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
[Cluster analysis in biomedical researches].
Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D
2013-01-01
Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.
3D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure
DEFF Research Database (Denmark)
Suhaibah, A.; Uznir, U.; Antón Castro, Francesc/François
2016-01-01
, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our...... findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results...... of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However...
A Clustering Genetic Algorithm for Cylinder Drag Optimization
Milano, Michele; Koumoutsakos, Petros
2002-01-01
A real coded genetic algorithm is implemented for the optimization of actuator parameters for cylinder drag minimization. We consider two types of idealized actuators that are allowed either to move steadily and tangentially to the cylinder surface (“belts”) or to steadily blow/suck with a zero net mass constraint. The genetic algorithm we implement has the property of identifying minima basins, rather than single optimum points. The knowledge of the shape of the minimum basin enables further insights into the system properties and provides a sensitivity analysis in a fully automated way. The drag minimization problem is formulated as an optimal regulation problem. By means of the clustering property of the present genetic algorithm, a set of solutions producing drag reduction of up to 50% is identified. A comparison between the two types of actuators, based on the clustering property of the algorithm, indicates that blowing/suction actuation parameters are associated with larger tolerances when compared to optimal parameters for the belt actuators. The possibility of using a few strategically placed actuators to obtain a significant drag reduction is explored using the clustering diagnostics of this method. The optimal belt-actuator parameters obtained by optimizing the two-dimensional case is employed in three-dimensional simulations, by extending the actuators across the span of the cylinder surface. The three-dimensional controlled flow exhibits a strong two-dimensional character near the cylinder surface, resulting in significant drag reduction.
Directory of Open Access Journals (Sweden)
Kuate-Defo, Bathélémy
2001-01-01
Full Text Available EnglishThis paper merges two parallel developments since the 1970s of newstatistical tools for data analysis: statistical methods known as hazard models that are used foranalyzing event-duration data and statistical methods for analyzing hierarchically clustered dataknown as multilevel models. These developments have rarely been integrated in research practice andthe formalization and estimation of models for hierarchically clustered survival data remain largelyuncharted. I attempt to fill some of this gap and demonstrate the merits of formulating and estimatingmultilevel hazard models with longitudinal data.FrenchCette étude intègre deux approches statistiques de pointe d'analyse des donnéesquantitatives depuis les années 70: les méthodes statistiques d'analyse desdonnées biographiques ou méthodes de survie et les méthodes statistiquesd'analyse des données hiérarchiques ou méthodes multi-niveaux. Ces deuxapproches ont été très peu mis en symbiose dans la pratique de recherche et parconséquent, la formulation et l'estimation des modèles appropriés aux donnéeslongitudinales et hiérarchiquement nichées demeure essentiellement un champd'investigation vierge. J'essaye de combler ce vide et j'utilise des données réellesen santé publique pour démontrer les mérites et contextes de formulation etd'estimation des modèles multi-niveaux et multi-états des données biographiqueset longitudinales.
Robustness of the ATLAS pixel clustering neural network algorithm
Sidebo, Per Edvin; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. The algorithms depend heavily on accurate estimation of the position of particles as they traverse the inner detector elements. An artificial neural network algorithm is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The method recovers otherwise lost tracks in dense environments where particles are separated by distances comparable to the size of the detector read-out elements. Such environments are highly relevant for LHC run 2, e.g. in searches for heavy resonances. Within the scope of run 2 track reconstruction performance and upgrades, the robustness of the neural network algorithm will be presented. The robustness has been studied by evaluating the stability of the algorithm’s performance under a range of variations in the pixel detector conditions.
Comparative Study of Clustering Algorithms in Text Mining Context
Directory of Open Access Journals (Sweden)
Abdennour Mohamed Jalil
2016-06-01
Full Text Available The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% of world population using internet [1]; have created a problem related of the processing of the data exchanged, which is rising exponentially and that should be automatically treated. This paper presents a classical process of knowledge discovery databases, in order to treat textual data. This process is divided into three parts: preprocessing, processing and post-processing. In the processing step, we present a comparative study between several clustering algorithms such as KMeans, Global KMeans, Fast Global KMeans, Two Level KMeans and FWKmeans. The comparison between these algorithms is made on real textual data from the web using RSS feeds. Experimental results identified two problems: the first one quality results which remain for algorithms, which rapidly converge. The second problem is due to the execution time that needs to decrease for some algorithms.
DYNAMIC REQUEST DISPATCHING ALGORITHM FOR WEB SERVER CLUSTER
Institute of Scientific and Technical Information of China (English)
Yang Zhenjiang; Zhang Deyun; Sun Qindong; Sun Qing
2006-01-01
Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.
Institute of Scientific and Technical Information of China (English)
陈理; 王克峰; 徐霄羽; 姚平经
2004-01-01
In this contribution we present an online scheduling algorithm for a real world multiproduct batch plant. The overall mixed integer nonlinear programming (MINLP) problem is hierarchically structured into a mixed integer linear programming (MILP) problem first and then a reduced dimensional MINLP problem, which are optimized by mathematical programming (MP) and genetic algorithm (GA) respectively. The basis idea relies on combining MP with GA to exploit their complementary capacity. The key features of the hierarchical model are explained and illustrated with some real world cases from the multiproduct batch plants.
Exploring New Clustering Algorithms for the CMS Tracker FED
Gamboa Alvarado, Jose Leandro
2013-01-01
In the current Front End (FE) firmware clusters of hits within the APV frames are found using a simple threshold comparison (which is made between the data and a 3 or 5 sigma strip noise cut) on reordered pedestal and Common Mode (CM) noise subtracted data. In addition the CM noise subtraction requires the baseline of each APV frame to be approximately uniform. Therefore, the current algorithm will fail if the APV baseline exhibits large-scale non-uniform behavior. Under very high luminosity conditions the assumption of a uniform APV baseline breaks down and the FED is unable to maintain a high efficiency of cluster finding. \
Directory of Open Access Journals (Sweden)
Iman Aghayan
2012-11-01
Full Text Available This paper compares two fuzzy clustering algorithms – fuzzy subtractive clustering and fuzzy C-means clustering – to a multi-layer perceptron neural network for their ability to predict the severity of crash injuries and to estimate the response time on the traffic crash data. Four clustering algorithms – hierarchical, K-means, subtractive clustering, and fuzzy C-means clustering – were used to obtain the optimum number of clusters based on the mean silhouette coefficient and R-value before applying the fuzzy clustering algorithms. The best-fit algorithms were selected according to two criteria: precision (root mean square, R-value, mean absolute errors, and sum of square error and response time (t. The highest R-value was obtained for the multi-layer perceptron (0.89, demonstrating that the multi-layer perceptron had a high precision in traffic crash prediction among the prediction models, and that it was stable even in the presence of outliers and overlapping data. Meanwhile, in comparison with other prediction models, fuzzy subtractive clustering provided the lowest value for response time (0.284 second, 9.28 times faster than the time of multi-layer perceptron, meaning that it could lead to developing an on-line system for processing data from detectors and/or a real-time traffic database. The model can be extended through improvements based on additional data through induction procedure.
DEFF Research Database (Denmark)
Mi, Jianli; Lock, Nina; Sun, Ting;
2010-01-01
A simple biomolecule-assisted hydrothermal approach has been developed for the fabrication of Bi2Te3 thermoelectric nanomaterials. The product has a nanostring-cluster hierarchical structure which is composed of ordered and aligned platelet-like crystals. The platelets are100 nm in diameter...
Abduljabbar, Mustafa
2017-05-11
Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.
Li, Yongjun; Zhao, Shanghong
2016-09-01
A novel routing algorithm (Hierarchical Supervisor and Agent Routing Algorithm, HSARA) for LEO/MEO (low earth orbit/medium earth orbit) double-layered optical satellite network is brought forward. The so-called supervisor (MEO satellite) is designed for failure recovery and network management. LEO satellites are grouped according to the virtual managed field of MEO which is different from coverage area of MEO satellite in RF satellite network. In each LEO group, one LEO satellite which has maximal persistent link with its supervisor is called the agent. A LEO group is updated when this optical inter-orbit links between agent LEO satellite and the corresponding MEO satellite supervisor cuts off. In this way, computations of topology changes and LEO group updating can be decreased. Expense of routing is integration of delay and wavelength utilization. HSARA algorithm simulations are implemented and the results are as follows: average network delay of HSARA can reduce 21 ms and 31.2 ms compared with traditional multilayered satellite routing and single-layer LEO satellite respectively; LEO/MEO double-layered optical satellite network can cover polar region which cannot be covered by single-layered LEO satellite and throughput is 1% more than that of single-layered LEO satellite averagely. Therefore, exact global coverage can be achieved with this double-layered optical satellite network.
A hierarchical algorithm for cyberspace situational awareness based on analytic hierarchy process
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
The existing network security management systems are unable either to provide users with useful security situation and risk assessment, or to aid administrators to make right and timely decisions based on the current state of network. These disadvantages always put the whole network security management at high risk. This paper establishes a simulation environment, captures the alerts as the experimental data and adopts statistical analysis to seek the vulnerabilities of the services provided by the hosts in the network. According to the factors of the network, the paper introduces the two concepts: Situational Meta and Situational Weight to depict the total security situation. A novel hierarchical algorithm based on analytic hierarchy process (AHP) is proposed to analyze the hierarchy of network and confirm the weighting coefficients. The algorithm can be utilized for modeling security situation, and determining its mathematical expression. Coupled with the statistical results, this paper simulates the security situational trends.Finally, the analysis of the simulation results proves the algorithm efficient and applicable, and provides us with an academic foundation for the implementation in the security situation.
FCM Clustering Algorithms for Segmentation of Brain MR Images
Directory of Open Access Journals (Sweden)
Yogita K. Dubey
2016-01-01
Full Text Available The study of brain disorders requires accurate tissue segmentation of magnetic resonance (MR brain images which is very important for detecting tumors, edema, and necrotic tissues. Segmentation of brain images, especially into three main tissue types: Cerebrospinal Fluid (CSF, Gray Matter (GM, and White Matter (WM, has important role in computer aided neurosurgery and diagnosis. Brain images mostly contain noise, intensity inhomogeneity, and weak boundaries. Therefore, accurate segmentation of brain images is still a challenging area of research. This paper presents a review of fuzzy c-means (FCM clustering algorithms for the segmentation of brain MR images. The review covers the detailed analysis of FCM based algorithms with intensity inhomogeneity correction and noise robustness. Different methods for the modification of standard fuzzy objective function with updating of membership and cluster centroid are also discussed.
Mapping cultivable land from satellite imagery with clustering algorithms
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Advanced defect detection algorithm using clustering in ultrasonic NDE
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
Core Business Selection Based on Ant Colony Clustering Algorithm
Yu Lan; Yan Bo; Yao Baozhen
2014-01-01
Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant c...
Clustering of Galaxies in a Hierarchical Universe 2 evolution to High Redshift
Kauffmann, G; Diaferio, A; White, S D M; Kauffmann, Guinevere; Colberg, Joerg M.; Diaferio, Antonaldo; White, Simon D.M.
1998-01-01
In hierarchical cosmologies the evolution of galaxy clustering depends both on cosmological quantities such as Omega and Lambda, which determine how dark matter halos form and evolve, and on the physical processes - cooling, star formation and feedback - which drive the formation of galaxies within these merging halos. In this paper, we combine dissipationless cosmological N-body simulations and semi-analytic models of galaxy formation in order to study how these two aspects interact. We focus on the differences in clustering predicted for galaxies of differing luminosity, colour, morphology and star formation rate and on what these differences can teach us about the galaxy formation process. We show that a "dip" in the amplitude of galaxy correlations between z=0 and z=1 can be an important diagnostic. Such a dip occurs in low-density CDM models because structure forms early and dark matter halos of 10**12 solar masses, containing galaxies with luminosities around L*, are unbiased tracers of the dark matter ...
Comparison of cluster expansion fitting algorithms for interactions at surfaces
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
Optimized algorithm for balancing clusters in wireless sensor networks
Institute of Scientific and Technical Information of China (English)
Mucheol KIM; Sun-hong KIM; Hyungjin BYUN; Sang-yong HAN
2009-01-01
Wireless sensor networks consist of hundreds or thousands of sensor nodes that involve numerous restrictions including computation capability and battery capacity. Topology control is an important issue for achieving a balanced placement of sensor nodes. The clustering scheme is a widely known and efficient means of topology control for transmitting information to the base station in two hops. The automatic routing scheme of the self-organizing technique is another critical element of wireless sensor networks. In this paper we propose an optimal algorithm with cluster balance taken into consideration, and compare it with three well known and widely used approaches, I.e., LEACH, MEER, and VAP-E, in performance evaluation. Experimental results show that the proposed approach increases the overall network lifetime, indicating that the amount of energy required for communication to the base station will be reduced for locating an optimal cluster.
Institute of Scientific and Technical Information of China (English)
卞金洪; 徐新洲; 魏昕; 赵力
2013-01-01
针对水声通信网中由于节点能耗不均衡而影响网络生命周期的问题,基于无线传感网络的层次路由算法,提出了一种适用于水下环境的水声通信网层次路由算法.该算法采用分轮的思想,使用改进的复杂网络社团结构检测谱方法的相关算法.通过网络初始化等措施构建水声通信网的图结构,并利用Laplacian阵与聚类算法得到簇结构,进而实现网络中数据的正常传输.仿真实验表明,在水声通信网的特殊条件下,该算法相对于传统的LEACH协议能取得较好的效果,在网络稳定传输数据的情况下,网络各轮的存活节点数均优于LEACH.%In order to overcome the existing problem facing underwater acoustic communication networks, the researchers propose to examine a novel hierarchical routing algorithm for underwater acoustic communication networks. This study will be conducted in accordance to the hierarchical routing algorithms in wireless sensor network. While focusing on the problems of the network life cycle affected by the unbalanced node' s energy consumption. The improved detection spectral algorithm of complex network is introduced based on the ideal of sub-wheel. First, the graph structure of acoustic communication networks is constructed by means of network initialization. Next, the Laplacian matrix and cluster algorithm will be used to generate the cluster structure thus realizing normal data transmission successfully. The simulation results shows, under the special condition of underwater acoustic communication networks, the hierarchical routing algorithm can achieve better results compared to the traditional LEACH protocol. The number of surviving nodes for each round in our algorithm exceeds that in LEACH, and in the condition of stable network data transmission.
A cluster analysis on road traffic accidents using genetic algorithms
Saharan, Sabariah; Baragona, Roberto
2017-04-01
The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.
Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion
Directory of Open Access Journals (Sweden)
Jin Qi
2015-01-01
Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.
Directory of Open Access Journals (Sweden)
I-Hsuan Lin
Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.
Clustering Algorithms: Their Application to Gene Expression Data
Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel
2016-01-01
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
An Efficient Admission Control Algorithm for Load Balancing In Hierarchical Mobile IPv6 Networks
Harini, Prof P
2009-01-01
In hierarchical Mobile IPv6 networks, Mobility Anchor Point (MAP) may become a single point of bottleneck as it handles more and more mobile nodes (MNs). A number of schemes have been proposed to achieve load balancing among different MAPs. However, signaling reduction is still imperfect because these schemes also avoid the effect of the number of CNs. Also only the balancing of MN is performed, but not the balancing of the actual traffic load, since CN of each MN may be different. This paper proposes an efficient admission control algorithm along with a replacement mechanism for HMIPv6 networks. The admission control algorithm is based on the number of serving CNs and achieves actual load balancing among MAPs. Moreover, a replacement mechanism is introduced to decrease the new MN blocking probability and the handoff MN dropping probability. By simulation results, we show that, the handoff delay and packet loss are reduced in our scheme, when compared with the standard HMIPv6 based handoff.
A Hierarchical NeuroBayes-based Algorithm for Full Reconstruction of B Mesons at B Factories
Feindt, Michael; Kreps, Michal; Kuhr, Thomas; Neubauer, Sebastian; Zander, Daniel; Zupanc, Anze
2011-01-01
We describe a new B-meson full reconstruction algorithm designed for the Belle experiment at the B-factory KEKB, an asymmetric e+e- collider. To maximize the number of reconstructed B decay channels, it utilizes a hierarchical reconstruction procedure and probabilistic calculus instead of classical selection cuts. The multivariate analysis package NeuroBayes was used extensively to hold the balance between highest possible efficiency, robustness and acceptable CPU time consumption. In total, 1042 exclusive decay channels were reconstructed, employing 71 neural networks altogether. Overall, we correctly reconstruct one B+/- or B0 candidate in 0.3% or 0.2% of the BBbar events, respectively. This is an improvement in efficiency by roughly a factor of 2, depending on the analysis considered, compared to the cut-based classical reconstruction algorithm used at Belle. The new framework also features the ability to choose the desired purity or efficiency of the fully reconstructed sample. If the same purity as for t...
Identifying multiple influential spreaders by a heuristic clustering algorithm
Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng
2017-03-01
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very "negligible". Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant.
Gravitation field algorithm and its application in gene cluster
Directory of Open Access Journals (Sweden)
Zheng Ming
2010-09-01
Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.
Local rewiring algorithms to increase clustering and grow a small world
Alstott, Jeff; Pizza, Pamela B; Radcliffe, Mary
2016-01-01
Many real-world networks have high clustering among vertices: vertices that share neighbors are often also directly connected to each other. A network's clustering can be a useful indicator of its connectedness and community structure. Algorithms for generating networks with high clustering have been developed, but typically rely on adding or removing edges and nodes, sometimes from a completely empty network. Here, we introduce algorithms that create a highly clustered network by starting with an existing network and rearranging edges, without adding or removing them; these algorithms can preserve other network properties even as the clustering increases. These algorithms rely on local rewiring rules, in which a single edge changes one of its vertices in a way that is guaranteed to increase clustering. This greedy algorithm can be applied iteratively to transform a random network into a form with much higher clustering. Additionally, these algorithms grow the network's clustering faster than they increase it...
Sweeney, Timothy E; Chen, Albert C; Gevaert, Olivier
2015-11-19
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
Zhang, Kui; Busov, Victor; Wei, Hairong
2017-01-01
Background Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. Results A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. Conclusions BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental
Directory of Open Access Journals (Sweden)
Markus Uhrig
Full Text Available Alzheimer's disease (AD is characterized by neuronal degeneration and cell loss. Abeta(42, in contrast to Abeta(40, is thought to be the pathogenic form triggering the pathological cascade in AD. In order to unravel overall gene regulation we monitored the transcriptomic responses to increased or decreased Abeta(40 and Abeta(42 levels, generated and derived from its precursor C99 (C-terminal fragment of APP comprising 99 amino acids in human neuroblastoma cells. We identified fourteen differentially expressed transcripts by hierarchical clustering and discussed their involvement in AD. These fourteen transcripts were grouped into two main clusters each showing distinct differential expression patterns depending on Abeta(40 and Abeta(42 levels. Among these transcripts we discovered an unexpected inverse and strong differential expression of neurogenin 2 (NEUROG2 and KIAA0125 in all examined cell clones. C99-overexpression had a similar effect on NEUROG2 and KIAA0125 expression as a decreased Abeta(42/Abeta(40 ratio. Importantly however, an increased Abeta(42/Abeta(40 ratio, which is typical of AD, had an inverse expression pattern of NEUROG2 and KIAA0125: An increased Abeta(42/Abeta(40 ratio up-regulated NEUROG2, but down-regulated KIAA0125, whereas the opposite regulation pattern was observed for a decreased Abeta(42/Abeta(40 ratio. We discuss the possibilities that the so far uncharacterized KIAA0125 might be a counter player of NEUROG2 and that KIAA0125 could be involved in neurogenesis, due to the involvement of NEUROG2 in developmental neural processes.
Directory of Open Access Journals (Sweden)
Diane G O Saunders
Full Text Available Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i contain a secretion signal, (ii are encoded by in planta induced genes, (iii have similarity to haustorial proteins, (iv are small and cysteine rich, (v contain a known effector motif or a nuclear localization signal, (vi are encoded by genes with long intergenic regions, (vii contain internal repeats, and (viii do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components.
Saunders, Diane G. O.; Win, Joe; Cano, Liliana M.; Szabo, Les J.; Kamoun, Sophien; Raffaele, Sylvain
2012-01-01
Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i) contain a secretion signal, (ii) are encoded by in planta induced genes, (iii) have similarity to haustorial proteins, (iv) are small and cysteine rich, (v) contain a known effector motif or a nuclear localization signal, (vi) are encoded by genes with long intergenic regions, (vii) contain internal repeats, and (viii) do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components. PMID:22238666
Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard
2015-01-01
Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, pclusters two and three, p=0.012); global distress (F=26.8, pcluster one, best for cluster four). The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.
A Flow-Partitioned Unequal Clustering Routing Algorithm for Wireless Sensor Networks
Jian Peng; Xiaohai Chen; Tang Liu
2014-01-01
Energy efficiency and energy balance are two important issues for wireless sensor networks. In previous clustering routing algorithms, multihop transmission, sleep scheduling, and unequal clustering are always used to improve energy efficiency and energy balance. In these algorithms, only the cluster heads share the burden of data forwarding in each round. In this paper, we propose a flow-partitioned unequal clustering routing (FPUC) algorithm to achieve better energy efficiency and energy ba...
Kalter, Henry D; Perin, Jamie; Black, Robert E
2016-06-01
Physician assessment historically has been the most common method of analyzing verbal autopsy (VA) data. Recently, the World Health Organization endorsed two automated methods, Tariff 2.0 and InterVA-4, which promise greater objectivity and lower cost. A disadvantage of the Tariff method is that it requires a training data set from a prior validation study, while InterVA relies on clinically specified conditional probabilities. We undertook to validate the hierarchical expert algorithm analysis of VA data, an automated, intuitive, deterministic method that does not require a training data set. Using Population Health Metrics Research Consortium study hospital source data, we compared the primary causes of 1629 neonatal and 1456 1-59 month-old child deaths from VA expert algorithms arranged in a hierarchy to their reference standard causes. The expert algorithms were held constant, while five prior and one new "compromise" neonatal hierarchy, and three former child hierarchies were tested. For each comparison, the reference standard data were resampled 1000 times within the range of cause-specific mortality fractions (CSMF) for one of three approximated community scenarios in the 2013 WHO global causes of death, plus one random mortality cause proportions scenario. We utilized CSMF accuracy to assess overall population-level validity, and the absolute difference between VA and reference standard CSMFs to examine particular causes. Chance-corrected concordance (CCC) and Cohen's kappa were used to evaluate individual-level cause assignment. Overall CSMF accuracy for the best-performing expert algorithm hierarchy was 0.80 (range 0.57-0.96) for neonatal deaths and 0.76 (0.50-0.97) for child deaths. Performance for particular causes of death varied, with fairly flat estimated CSMF over a range of reference values for several causes. Performance at the individual diagnosis level was also less favorable than that for overall CSMF (neonatal: best CCC = 0.23, range 0
Directory of Open Access Journals (Sweden)
S.S. Arya
2012-10-01
Full Text Available Thepla is an Indian unleavened flatbread made from whole-wheat flour with added spices and vegetables. It is particularly consumed in western zone of the India. The preparation of thepla is tedious, time consuming and requires skill. In the present study standardization of thepla ingredients were carried out by standardizing each ingredient on the basis of Overall Acceptability (OA score. Sensory analysis was carried out using nine-point hedonic rating scale with ten trained panellists. Standardized ingredients of thepla were: salt 3%, red chili powder 2.5%, fenugreek leaves 12%, cumin seed powder 0.6%, coriander seed powder 0.6%, ginger garlic paste (1:1 6%, asafoetida 0.6% and oil 3% w/w of whole wheat flour on the basis of highest sensory OA score. Further thepla process parameters such as time, temperature, diameter of thepla and weight of dough were standardized on the basis of sensory OA score. Obtained sensory score data was processed for Hierarchical Cluster Analysis (HCA.
A new Hierarchical Group Key Management based on Clustering Scheme for Mobile Ad Hoc Networks
Directory of Open Access Journals (Sweden)
Ayman EL-SAYED
2014-05-01
Full Text Available The migration from wired network to wireless network has been a global trend in the past few decades because they provide anytime-anywhere networking services. The wireless networks are rapidly deployed in the future, secure wireless environment will be mandatory. As well, The mobility and scalability brought by wireless network made it possible in many applications. Among all the contemporary wireless networks,Mobile Ad hoc Networks (MANET is one of the most important and unique applications. MANET is a collection of autonomous nodes or terminals which communicate with each other by forming a multihop radio network and maintaining connectivity in a decentralized manner. Due to the nature of unreliable wireless medium data transfer is a major problem in MANET and it lacks security and reliability of data. The most suitable solution to provide the expected level of security to these services is the provision of a key management protocol. A Key management is vital part of security. This issue is even bigger in wireless network compared to wired network. The distribution of keys in an authenticated manner is a difficult task in MANET. When a member leaves or joins the group, it needs to generate a new key to maintain forward and backward secrecy. In this paper, we propose a new group key management schemes namely a Hierarchical, Simple, Efficient and Scalable Group Key (HSESGK based on clustering management scheme for MANETs and different other schemes are classified. Group members deduce the group key in a distributed manner.
Development of Automatic Cluster Algorithm for Microcalcification in Digital Mammography
Energy Technology Data Exchange (ETDEWEB)
Choi, Seok Yoon [Dept. of Medical Engineering, Korea University, Seoul (Korea, Republic of); Kim, Chang Soo [Dept. of Radiological Science, College of Health Sciences, Catholic University of Pusan, Pusan (Korea, Republic of)
2009-03-15
Digital Mammography is an efficient imaging technique for the detection and diagnosis of breast pathological disorders. Six mammographic criteria such as number of cluster, number, size, extent and morphologic shape of microcalcification, and presence of mass, were reviewed and correlation with pathologic diagnosis were evaluated. It is very important to find breast cancer early when treatment can reduce deaths from breast cancer and breast incision. In screening breast cancer, mammography is typically used to view the internal organization. Clusterig microcalcifications on mammography represent an important feature of breast mass, especially that of intraductal carcinoma. Because microcalcification has high correlation with breast cancer, a cluster of a microcalcification can be very helpful for the clinical doctor to predict breast cancer. For this study, three steps of quantitative evaluation are proposed : DoG filter, adaptive thresholding, Expectation maximization. Through the proposed algorithm, each cluster in the distribution of microcalcification was able to measure the number calcification and length of cluster also can be used to automatically diagnose breast cancer as indicators of the primary diagnosis.
基于类轮廓层次聚类方法的研究%RESEARCH ON CLASS-PROFILE-BASED HIERARCHICAL CLUSTERING METHOD
Institute of Scientific and Technical Information of China (English)
孟海东; 唐旋
2011-01-01
传统的聚类算法在考虑类与类之间的连通性特征和近似性特征上往往顾此失彼.首先给出类边界点和类轮廓的基本定义以及寻求方法,然后基于类间连通性特征和近似性特征的综合考虑,拟定一些类间相似性度量标准和方法,最后提出一种基于类轮廓的层次聚类算法.该算法能够有效处理任意形状的簇,且能够区分孤立点和噪声数据.通过对图像数据集和Iris标准数据集的聚类分析,验证了该算法的可行性和有效性.%Traditional clustering algorithms are often incapable of roundly considering the connectivity and similarity characteristics among classes. The thesis firstly presents the fundamental definition of class boundary point and class profile; secondly, with comprehensive consideration based on connectivity characteristics and similarity characteristics among classes, defines some standards and methods for inter class similarity measurement; thirdly, proposes a class-profile-based hierarchical clustering algorithm, which is able to effectively process arbitrary shaped clusters and distinguish isolated points from noise data. The feasibility and effectiveness of the algorithm is validated through clustering analysis on image data sets and Iris standard data sets.
Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm
Directory of Open Access Journals (Sweden)
S.Padmaja
2016-02-01
Full Text Available The proposed work does an improved K-means clustering algorithm for identifying internet user behaviour. Web data analysis includes the transformation and interpretation of web log data find out the information, patterns and knowledge discovery. The efficiency of the algorithm is analyzed by considering certain parameters. The parameters are date, time, S_id, CS_method, C_IP, User_agent and time taken. The research done by using more than 2 years of real data set collected from two different group of institutions web server .this dataset provides a better analysis of Log data to identify internet user behaviour.
Clustering Algorithms for Heterogeneous Wireless Sensor Networks - A Brief Survey
Directory of Open Access Journals (Sweden)
A.MeenaKowshalya
2011-09-01
Full Text Available Wireless sensor networks (WSN are emerging in vari ous fields like disaster management, battle field surveillance and border security surveillance. A la rge number of sensors in these applications are unattended and work autonomously. Clustering is a k ey technique to improve the network lifetime, reduc e the energy consumption and increase the scalability of the sensor network. In this paper, we study the impact of heterogeneity of the nodes to the perform ance of WSN. This paper surveys the different clust ering algorithm for heterogeneous WSN .
Classification of posture maintenance data with fuzzy clustering algorithms
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Cluster-Based Distributed Algorithms for Very Large Linear Equations
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n3/N), the memory cost is O(n2/N), the I/O cost is O(n2/N), and the communication cost is O(Nn), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 106×106 effectively.
Dynamic and static properties of the invaded cluster algorithm
Moriarty, K.; Machta, J.; Chayes, L. Y.
1999-02-01
Simulations of the two-dimensional Ising and three-state Potts models at their critical points are performed using the invaded cluster (IC) algorithm. It is argued that observables measured on a sublattice of size l should exhibit a crossover to Swendsen-Wang (SW) behavior for l sufficiently less than the lattice size L, and a scaling form is proposed to describe the crossover phenomenon. It is found that the energy autocorrelation time τɛ(l,L) for an l×l sublattice attains a maximum in the crossover region, and a dynamic exponent zIC for the IC algorithm is defined according to τɛ,max~LzIC. Simulation results for the three-state model yield zIC=0.346+/-0.002, which is smaller than values of the dynamic exponent found for the SW and Wolff algorithms and also less than the Li-Sokal bound. The results are less conclusive for the Ising model, but it appears that zICWolff algorithms.
Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition.
Liu, An-An; Su, Yu-Ting; Nie, Wei-Zhi; Kankanhalli, Mohan
2017-01-01
This paper proposes a hierarchical clustering multi-task learning (HC-MTL) method for joint human action grouping and recognition. Specifically, we formulate the objective function into the group-wise least square loss regularized by low rank and sparsity with respect to two latent variables, model parameters and grouping information, for joint optimization. To handle this non-convex optimization, we decompose it into two sub-tasks, multi-task learning and task relatedness discovery. First, we convert this non-convex objective function into the convex formulation by fixing the latent grouping information. This new objective function focuses on multi-task learning by strengthening the shared-action relationship and action-specific feature learning. Second, we leverage the learned model parameters for the task relatedness measure and clustering. In this way, HC-MTL can attain both optimal action models and group discovery by alternating iteratively. The proposed method is validated on three kinds of challenging datasets, including six realistic action datasets (Hollywood2, YouTube, UCF Sports, UCF50, HMDB51 & UCF101), two constrained datasets (KTH & TJU), and two multi-view datasets (MV-TJU & IXMAS). The extensive experimental results show that: 1) HC-MTL can produce competing performances to the state of the arts for action recognition and grouping; 2) HC-MTL can overcome the difficulty in heuristic action grouping simply based on human knowledge; 3) HC-MTL can avoid the possible inconsistency between the subjective action grouping depending on human knowledge and objective action grouping based on the feature subspace distributions of multiple actions. Comparison with the popular clustered multi-task learning further reveals that the discovered latent relatedness by HC-MTL aids inducing the group-wise multi-task learning and boosts the performance. To the best of our knowledge, ours is the first work that breaks the assumption that all actions are either
Stern, Michael D; Maltseva, Larissa A; Juhaszova, Magdalena; Sollott, Steven J; Lakatta, Edward G; Maltsev, Victor A
2014-05-01
rate in response to β-adrenergic stimulation. The model indicates that the hierarchical clustering of surface RyRs in SANCs may be a crucial adaptive mechanism. Pathological desynchronization of the clocks may explain sinus node dysfunction in heart failure and RyR mutations.
A Novel Dynamic Clustering Algorithm Based on Immune Network and Tabu Search
Institute of Scientific and Technical Information of China (English)
ZHONGJiang; WUZhongfu; WUKaigui; YANGQiang
2005-01-01
It's difficult to indicate the rational number of partitions in the data set before clustering usually.The problem can't be solved by traditional clustering algorithm, such as k-means or its variations. This paper proposes a novel Dynamic clustering algorithm based on the artificial immune network and tabu search (DCBIT). It optimizes the number and the location of the clusters at the same time. The algorithm includes two phases, it begins by running immune network algorithm to find a Clustering feasible solution (CFS), then it employs tabu search to get the optimum cluster number and cluster centers on the CFS. Also, the probabilities acquiring the CFS through immune network algorithm have been discussed in this paper. Some experimental results show that new algorithm has satisfied convergent probability and convergent speed.
Institute of Scientific and Technical Information of China (English)
郭红; 黄佳鑫; 郭昆
2015-01-01
The mining and discovery of overlapping and hierarchical communities is a hot topic in the area of social network research. Firstly, an algorithm, discovery of link conmunities based on extended link cluster sequence ( DLC ECS) , is proposed to detect overlapping and hierarchical communities in social networks efficiently. Based on the extended link cluster sequence corresponding to community structures with various densities, the optimal link community is detected after searching for the global optimal density. The link communities are transformed into the node communities, and thus the overlapping communities can be found out. Then, hierarchical link communities extraction based on extended link cluster sequence ( HLCE ECS ) is designed. Hierarchical link communities from the extended link cluster sequence is found by the proposed algorithm. The link communities are transformed into the node communities to find out the overlapping and hierarchical communities. Experimental results on are artificial and real-world datasets demonstrate that DLC ECS algorithm significantly improves the community quality and HLCE ECS algorithm effectively discovers meaningful hierarchical communities.%高质量重叠层次社区的挖掘和发现已成为社会网络研究热点,为更有效地发现社会网络中具有重叠层次性的社区结构,提出基于增广边簇序列的边社区发现算法( DLC ECS)。在产生包含所有可能密度参数对应的社区结构的增广边簇序列的基础上,找出全局最优的密度参数,发现全局最优的边社区结构,将识别的边社区结构转化为节点社区结构,发现具有重叠结构的社区。在该序列的基础上,提出层次边社区提取算法( HLCE ECS),快速发现序列中的层次边社区结构,将识别的边社区结构转化为节点社区结构,发现同时具有重叠和层次结构的社区。在真实数据集和人工数据集上的实验表明,DLC ECS具有
Image Transformation using Modified Kmeans clustering algorithm for Parallel saliency map
Directory of Open Access Journals (Sweden)
Aman Sharma
2013-08-01
Full Text Available to design an image transformation system is Depending on the transform chosen, the input and output images may appear entirely different and have different interpretations. Image Transformationwith the help of certain module like input image, image cluster index, object in cluster and color index transformation of image. K-means clustering algorithm is used to cluster the image for bettersegmentation. In the proposed method parallel saliency algorithm with K-means clustering is used to avoid local minima and to find the saliency map. The region behind that of using parallel saliency algorithm is proved to be more than exiting saliency algorithm.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
Directory of Open Access Journals (Sweden)
Mingwei Leng
2013-01-01
Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
A global optimization approach to turbine blade design based on hierarchical fair competition genetic algorithms with dynamic niche (HFCDN-GAs) coupled with Reynolds-averaged Navier-Stokes (RANS) equation is presented. In order to meet the search theory of GAs and the aerodynamic performances of turbine, Bezier curve is adopted to parameterize the turbine blade profile, and a fitness function pertaining to optimization is designed. The design variables are the control points' ordinates of characteristic polygon of Bezier curve representing the turbine blade profile. The object function is the maximum lift-drag ratio of the turbine blade. The constraint conditions take into account the leading and trailing edge metal angle, and the strength and aerodynamic performances of turbine blade. And the treatment method of the constraint conditions is the flexible penalty function. The convergence history of test function indicates that HFCDN-GAs can locate the global optimum within a few search steps and have high robustness. The lift-drag ratio of the optimized blade is 8.3% higher than that of the original one. The results show that the proposed global optimization approach is effective for turbine blade.
A Hierarchical Optimization Algorithm Based on GPU for Real-Time 3D Reconstruction
Lin, Jin-hua; Wang, Lu; Wang, Yan-jie
2017-06-01
In machine vision sensing system, it is important to realize high-quality real-time 3D reconstruction in large-scale scene. The recent online approach performed well, but scaling up the reconstruction, it causes pose estimation drift, resulting in the cumulative error, usually requiring a large number of off-line operation to completely correct the error, reducing the reconstruction performance. In order to optimize the traditional volume fusion method and improve the old frame-to-frame pose estimation strategy, this paper presents a real-time CPU to Graphic Processing Unit reconstruction system. Based on a robust camera pose estimation strategy, the algorithm fuses all the RGB-D input values into an effective hierarchical optimization framework, and optimizes each frame according to the global camera attitude, eliminating the serious dependence on the tracking timeliness and continuously tracking globally optimized frames. The system estimates the global optimization of gestures (bundling) in real-time, supports for robust tracking recovery (re-positioning), and re-estimation of large-scale 3D scenes to ensure global consistency. It uses a set of sparse corresponding features, geometric and ray matching functions in one of the parallel optimization systems. The experimental results show that the average reconstruction time is 415 ms per frame, the ICP pose is estimated 20 times in 100.0 ms. For large scale 3D reconstruction scene, the system performs well in online reconstruction area, keeping the reconstruction accuracy at the same time.
Automatic Curve Fitting Based on Radial Basis Functions and a Hierarchical Genetic Algorithm
Directory of Open Access Journals (Sweden)
G. Trejo-Caballero
2015-01-01
Full Text Available Curve fitting is a very challenging problem that arises in a wide variety of scientific and engineering applications. Given a set of data points, possibly noisy, the goal is to build a compact representation of the curve that corresponds to the best estimate of the unknown underlying relationship between two variables. Despite the large number of methods available to tackle this problem, it remains challenging and elusive. In this paper, a new method to tackle such problem using strictly a linear combination of radial basis functions (RBFs is proposed. To be more specific, we divide the parameter search space into linear and nonlinear parameter subspaces. We use a hierarchical genetic algorithm (HGA to minimize a model selection criterion, which allows us to automatically and simultaneously determine the nonlinear parameters and then, by the least-squares method through Singular Value Decomposition method, to compute the linear parameters. The method is fully automatic and does not require subjective parameters, for example, smooth factor or centre locations, to perform the solution. In order to validate the efficacy of our approach, we perform an experimental study with several tests on benchmarks smooth functions. A comparative analysis with two successful methods based on RBF networks has been included.
Clustering Algorithm Based on Crowding Niche%小生境排挤聚类算法
Institute of Scientific and Technical Information of China (English)
业宁; 董逸生
2003-01-01
A new clustering algorithm is proposed in this paper, which is based on crowding niche. Homogeneityspontaneous to withstands heterogeneity when organisms are evolving. Contemporary, Individual in same class com-pete each other to strive for limited resource. Individual that has bad fitness will be eliminated. We propose a cluster-ing algorithm based on this idea. Experiment evaluation has proved its efficiency.
A Heuristic Task Scheduling Algorithm for Heterogeneous Virtual Clusters
Directory of Open Access Journals (Sweden)
Weiwei Lin
2016-01-01
Full Text Available Cloud computing provides on-demand computing and storage services with high performance and high scalability. However, the rising energy consumption of cloud data centers has become a prominent problem. In this paper, we first introduce an energy-aware framework for task scheduling in virtual clusters. The framework consists of a task resource requirements prediction module, an energy estimate module, and a scheduler with a task buffer. Secondly, based on this framework, we propose a virtual machine power efficiency-aware greedy scheduling algorithm (VPEGS. As a heuristic algorithm, VPEGS estimates task energy by considering factors including task resource demands, VM power efficiency, and server workload before scheduling tasks in a greedy manner. We simulated a heterogeneous VM cluster and conducted experiment to evaluate the effectiveness of VPEGS. Simulation results show that VPEGS effectively reduced total energy consumption by more than 20% without producing large scheduling overheads. With the similar heuristic ideology, it outperformed Min-Min and RASA with respect to energy saving by about 29% and 28%, respectively.
Ternary alloy material prediction using genetic algorithm and cluster expansion
Energy Technology Data Exchange (ETDEWEB)
Chen, Chong [Iowa State Univ., Ames, IA (United States)
2015-12-01
This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe_{3}VSi_{2} is a new stable phase and it can be very inspiring to the future experiments.
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
jClustering, an open framework for the development of 4D clustering algorithms.
Directory of Open Access Journals (Sweden)
José María Mateos-Pérez
Full Text Available We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License to allow modification if necessary.
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
ZHANG; Zhihua
2001-01-01
［1］Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum, 1981.［2］Krishnapuram, R., Keller, J., A possibilistic approach to clustering, IEEE Trans. on Fuzzy Systems, 1993, 1(2): 98.［3］Yair, E., Zeger, K., Gersho, A., Competitive learning and soft competition for vector quantizer design, IEEE Trans on Signal Processing, 1992, 40(2): 294.［4］Pal, N. R., Bezdek, J. C., Tsao, E. C. K., Generalized clustering networks and Kohonen's self-organizing scheme, IEEE Trans on Neural Networks, 1993, 4(4): 549.［5］Karayiannis, N. B., Bezdek, J. C., Pal, N. R. et al., Repair to GLVQ: a new family of competitive learning schemes, IEEE Trans on Neural Networks, 1996, 7(5): 1062.［6］Karayiannis, N. B., Pai, P. I., Fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1996, 7(5): 1196.［7］Karayiannis, N. B., A methodology for constructing fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1997, 8(3): 505.［8］Karayiannis, N. B., Bezdek, J. C., An integrated approach to fuzzy learning vector quantization and fuzzy C-Means clustering, IEEE Trans. on Fuzzy Systems, 1997, 5(4): 622.［9］Li Xing-si, An efficient approach to nonlinear minimax problems, Chinese Science Bulletin? 1992, 37(10): 802.［10］Li Xing-si, An efficient approach to a class of non-smooth optimization problems, Science in China, Series A,1994, 37(3): 323.［11］. Zangwill, W., Non-linear Programming: A Unified Approach, Englewood Cliffs: Prentice-Hall, 1969.［12］. Fletcher, R., Practical Methods of Optimization,2nd ed., New York: John Wiley & Sons, 1987.［13］. Zhang Zhihua, Zheng Nanning, Wang Tianshu, Behavioral analysis and improving of generalized LVQ neural network, Acta Automatica Sinica, 1999, 25(5): 582.［14］. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., Optimization by simulated annealing, Science, 1983, 220(3): 671.［15］. Ross, K., Deterministic annealing for
A Request Distribution Algorithm for Web Server Cluster
Directory of Open Access Journals (Sweden)
Wei Zhang
2011-12-01
Full Text Available With the explosively increasing of web-based applications’ workloads, Web server cluster encounters challenge in response time for requests. Request distribution among servers in web server cluster is the key to address such challenge, especially under heavy workloads. In this paper, we propose a new request distribution algorithm named llac (least load active cache for load balancing switch in web server cluster. The goal of llac is to improve the cache hit rate and reduce response time. Packets are parsed in IP level, and back-end servers are notified to cache hot files using link change technology, neither changing URL information nor modifying the service program. This avoids switching overhead between user mode and kernel mode. The load balancing switch directly creates connection with the selected server, avoiding migrating connection overhead. This policy estimates the current composited load of each server and selects the server with the least load to serve the request. It also improves the resource utilization of web servers. Experimental results show that llac achieves better performance for web applications than wrr (weight round robin which is a popular request distribution.
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Routing Algorithm of Hierarchical Wireless Sensor Network%一种基于分层无线传感器网络的路由算法
Institute of Scientific and Technical Information of China (English)
邹瑜; 彭舰; 黎红友
2012-01-01
在多跳无线传感器网络中,靠近sink的节点由于需要转发来自外部的数据,其能量消耗速度快于离sink较远的节点,从而导致“能量空洞”的出现.采用分层的网络结构能够有效延迟能量空洞的出现.在分析现有路由算法 的基础上,结合分层的思想,对现有算法的路由算法进行了改进,提出了分层网络中各层环内最佳簇头和成簇概率的计算方法.在路由发现阶段引入了簇头路由指标,用于控制路由簇头接纳的路由数量,从而平衡了环内各个路由簇头的能量消耗.仿真实验结果表明,新的路由算法在网络生存时间、能耗均匀程度方面均优于现有算法.%Cluster-heads closer to the sink are burdened with heavy relay traffic and incline to die early, because the clustesr-heads transmit their data to sink via multi-hop communication. And this phenomenon is known as "energy hole". It wasproved that the architecture of hierarchical network can effectively delay the energy hole problem. Based on the method of the main routing algorithms, the existing routing algorithms was improved in computing the number of optimal cluster-head and the probability of each node being cluster-head, in every annular network. Considering the thought of hierarchy,cluster-head routing quota (CRQ) algorithm was proposed,which can be used to control the accepting numbers of each router,in phrase of routing detecting. Thus,it meets the demand of evenly consuming the ener-gy of each cluster-head located in the same ring. Simulation results demonstrate that the new algorithm is better than existing routing algorithm in the network lifetime and energy consumption.
3D NEAREST NEIGHBOUR SEARCH USING A CLUSTERED HIERARCHICAL TREE STRUCTURE
Directory of Open Access Journals (Sweden)
A. Suhaibah
2016-06-01
Full Text Available Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.
Textural defect detect using a revised ant colony clustering algorithm
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section
Institute of Scientific and Technical Information of China (English)
YU Yongqian; ZHAO Xiangguo; CHEN Hengyue; WANG Bin; YU Ge; WANG Guoren
2006-01-01
This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance.
An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering
Institute of Scientific and Technical Information of China (English)
Taher NIKNAM; Babak AMIRI; Javad OLAMAEI; Ali AREFI
2009-01-01
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Riplcy's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.
An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
Directory of Open Access Journals (Sweden)
Kang Zhang
2014-01-01
Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.
Directory of Open Access Journals (Sweden)
G. Abel Thangaraja
2014-11-01
Full Text Available The need of Data mining is because of the explosive growth of data from terabytes to petabytes. Data mining preprocess aims to produce the quality mining result in descriptive and predictive analysis. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. A straightforward way to combine structural and attribute similarities is to use a weighted distance function. Clustering results are arrived based on attribute similarities. The clusters balance the attribute and structural similarities. The existing Structural and Attribute cluster algorithm is analyzed and a new algorithm is proposed. Both the algorithms are compared and results are analyzed. It is found that the modified algorithm gives better quality clusters.
Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
Directory of Open Access Journals (Sweden)
Jinhua Li
2016-01-01
Full Text Available Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.
Directory of Open Access Journals (Sweden)
Xiaowei Li
2017-01-01
Full Text Available A large number of studies demonstrated that major depressive disorder (MDD is characterized by the alterations in brain functional connections which is also identifiable during the brain’s “resting-state.” But, in the present study, the approach of constructing functional connectivity is often biased by the choice of the threshold. Besides, more attention was paid to the number and length of links in brain networks, and the clustering partitioning of nodes was unclear. Therefore, minimum spanning tree (MST analysis and the hierarchical clustering were first used for the depression disease in this study. Resting-state electroencephalogram (EEG sources were assessed from 15 healthy and 23 major depressive subjects. Then the coherence, MST, and the hierarchical clustering were obtained. In the theta band, coherence analysis showed that the EEG coherence of the MDD patients was significantly higher than that of the healthy controls especially in the left temporal region. The MST results indicated the higher leaf fraction in the depressed group. Compared with the normal group, the major depressive patients lost clustering in frontal regions. Our findings suggested that there was a stronger brain interaction in the MDD group and a left-right functional imbalance in the frontal regions for MDD controls.
Directory of Open Access Journals (Sweden)
Guohua Zou
2016-12-01
Full Text Available New medical imaging technology, such as Computed Tomography and Magnetic Resonance Imaging (MRI, has been widely used in all aspects of medical diagnosis. The purpose of these imaging techniques is to obtain various qualitative and quantitative data of the patient comprehensively and accurately, and provide correct digital information for diagnosis, treatment planning and evaluation after surgery. MR has a good imaging diagnostic advantage for brain diseases. However, as the requirements of the brain image definition and quantitative analysis are always increasing, it is necessary to have better segmentation of MR brain images. The FCM (Fuzzy C-means algorithm is widely applied in image segmentation, but it has some shortcomings, such as long computation time and poor anti-noise capability. In this paper, firstly, the Ant Colony algorithm is used to determine the cluster centers and the number of FCM algorithm so as to improve its running speed. Then an improved Markov random field model is used to improve the algorithm, so that its antinoise ability can be improved. Experimental results show that the algorithm put forward in this paper has obvious advantages in image segmentation speed and segmentation effect.
Directory of Open Access Journals (Sweden)
Katrien Moens
Full Text Available Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries.Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores.Among the sample (N=217 the mean age was 36.5 (SD 9.0, 73.2% were female, and 49.1% were on antiretroviral therapy (ART. The cluster analysis produced five symptom clusters identified as: 1 dermatological; 2 generalised anxiety and elimination; 3 social and image; 4 persistently present; and 5 a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, p<0.001; being on ART (highest proportions for clusters two and three, p=0.012; global distress (F=26.8, p<0.001, physical distress (F=36.3, p<0.001 and psychological distress subscale (F=21.8, p<0.001 (all subscales worst for cluster one, best for cluster four.The greatest burden is associated with cluster one, and should be prioritised in clinical management. Further symptom cluster research in people living with HIV with longitudinally collected symptom data to
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Extension of K-Means Algorithm for clustering mixed data | Onuodu ...
African Journals Online (AJOL)
Extension of K-Means Algorithm for clustering mixed data. ... PROMOTING ACCESS TO AFRICAN RESEARCH ... In this work, a new hybrid method has been proposed which extends K-means algorithm to categorical domain and mixed-type ...
Institute of Scientific and Technical Information of China (English)
祝永志; 张丹丹; 曹宝香; 禹继国
2012-01-01
针对多核SMP机群的体系结构特点,讨论了MPI+ OpenMP混合并行程序设计技术.提出了一种多层次化混合设计新方法.设计了N-body问题的多层次化并行算法,并在曙光5000A机群上与传统的混合算法作了性能方面的比较.结果表明,该层次化混合并行算法具有更好的扩展性和加速比.%For multi-core SMP cluster systems, this paper discusses hybrid parallel programming techniques based on MPI and OpenMP.We propose a new hybrid parallel programming methods lhat are aware of architecture hierarchy on SMP cluster systems. We design a hierarchically parallel algorithm on the N-body problem, and compared its performance with traditional hybrid parallel algorithms on the Dawning 5000A cluster. The results indicate that our hierarchically hybrid parallel algorithm has better scalability and speedup than others.
Vinitsky, Sergue; Chuluunbaatar, Ochbadrakh; Rostovtsev, Vitaly; Hai, Luong Le; Derbov, Vladimir; Krassovitskiy, Pavel
2013-01-01
A model for quantum tunnelling of a cluster comprising A identical particles, coupled by oscillator-type potential, through short-range repulsive potential barriers is introduced for the first time in the new symmetrized-coordinate representation and studied within the s-wave approximation. The symbolic-numerical algorithms for calculating the effective potentials of the close-coupling equations in terms of the cluster wave functions and the energy of the barrier quasistationary states are formulated and implemented using the Maple computer algebra system. The effect of quantum transparency, manifesting itself in nonmonotonic resonance-type dependence of the transmission coefficient upon the energy of the particles, the number of the particles A=2,3,4, and their symmetry type, is analyzed. It is shown that the resonance behavior of the total transmission coefficient is due to the existence of barrier quasistationary states imbedded in the continuum.
Identifying prototypical components in behaviour using clustering algorithms.
Directory of Open Access Journals (Sweden)
Elke Braun
Full Text Available Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts.
Hierarchical Affinity Propagation
Givoni, Inmar; Frey, Brendan J
2012-01-01
Affinity propagation is an exemplar-based clustering algorithm that finds a set of data-points that best exemplify the data, and associates each datapoint with one exemplar. We extend affinity propagation in a principled way to solve the hierarchical clustering problem, which arises in a variety of domains including biology, sensor networks and decision making in operational research. We derive an inference algorithm that operates by propagating information up and down the hierarchy, and is efficient despite the high-order potentials required for the graphical model formulation. We demonstrate that our method outperforms greedy techniques that cluster one layer at a time. We show that on an artificial dataset designed to mimic the HIV-strain mutation dynamics, our method outperforms related methods. For real HIV sequences, where the ground truth is not available, we show our method achieves better results, in terms of the underlying objective function, and show the results correspond meaningfully to geographi...
Parallel Genetic Algorithms with Dynamic Topology using Cluster Computing
Directory of Open Access Journals (Sweden)
ADAR, N.
2016-08-01
Full Text Available A parallel genetic algorithm (PGA conducts a distributed meta-heuristic search by employing genetic algorithms on more than one subpopulation simultaneously. PGAs migrate a number of individuals between subpopulations over generations. The layout that facilitates the interactions of the subpopulations is called the topology. Static migration topologies have been widely incorporated into PGAs. In this article, a PGA with a dynamic migration topology (D-PGA is proposed. D-PGA generates a new migration topology in every epoch based on the average fitness values of the subpopulations. The D-PGA has been tested against ring and fully connected migration topologies in a Beowulf Cluster. The D-PGA has outperformed the ring migration topology with comparable communication cost and has provided competitive or better results than a fully connected migration topology with significantly lower communication cost. PGA convergence behaviors have been analyzed in terms of the diversities within and between subpopulations. Conventional diversity can be considered as the diversity within a subpopulation. A new concept of permeability has been introduced to measure the diversity between subpopulations. It is shown that the success of the proposed D-PGA can be attributed to maintaining a high level of permeability while preserving diversity within subpopulations.
A Heuristic Clustering Algorithm for Mining Communities in Signed Networks
Institute of Scientific and Technical Information of China (English)
Bo Yang; Da-You Liu
2007-01-01
Signed network is an important kind of complex network, which includes both positive relations and negative relations. Communities of a signed network are defined as the groups of vertices, within which positive relations are dense and between which negative relations are also dense. Being able to identify communities of signed networks is helpful for analysis of such networks. Hitherto many algorithms for detecting network communities have been developed. However, most of them are designed exclusively for the networks including only positive relations and are not suitable for signed networks.So the problem of mining communities of signed networks quickly and correctly has not been solved satisfactorily. In this paper, we propose a heuristic algorithm to address this issue. Compared with major existing methods, our approach has three distinct features. First, it is very fast with a roughly linear time with respect to network size. Second, it exhibits a good clustering capability and especially can work well with complex networks without well-defined community structures.Finally, it is insensitive to its built-in parameters and requires no prior knowledge.
Energy Technology Data Exchange (ETDEWEB)
Baldwin, C; Eliassi-Rad, T; Abdulla, G; Critchlow, T
2003-04-16
As scientific data sets grow exponentially in size, the need for scalable algorithms that heuristically partition the data increases. In this paper, we describe the three-step evolution of a hierarchical partitioning algorithm for large-scale spatio-temporal scientific data sets generated by massive simulations. The first version of our algorithm uses a simple top-down partitioning technique, which divides the data by using a four-way bisection of the spatio-temporal space. The shortcomings of this algorithm lead to the second version of our partitioning algorithm, which uses a bottom-up approach. In this version, a partition hierarchy is constructed by systematically agglomerating the underlying Cartesian grid that is placed on the data. Finally, the third version of our algorithm utilizes the intrinsic topology of the data given in the original scientific problem to build the partition hierarchy in a bottom-up fashion. Specifically, the topology is used to heuristically agglomerate the data at each level of the partition hierarchy. Despite the growing complexity in our algorithms, the third version of our algorithm builds partition hierarchies in less time and is able to build trees for larger size data sets as compared to the previous two versions.
IMPROVING THE CLUSTER PERFORMANCE BY COMBINING PSO AND K-MEANS ALGORITHM
Directory of Open Access Journals (Sweden)
G. Komarasamy
2011-04-01
Full Text Available Clustering is a technique that can divide data objects into groups based on information found in the data that describes the objects and their relationships. In this paper describe to improving the clustering performance by combine Particle Swarm Optimization (PSO and K-means algorithm. The PSO algorithm successfully converges during the initial stages of a global search, but around global optimum, the search process will become very slow. On the contrary, K-means algorithm can achieve faster convergence to optimum solution. Unlike K-means method, new algorithm does not require a specific number of clusters given before performing the clustering process and it is able to find the local optimal number of clusters during the clustering process. In each iteration process, the inertia weight was changed based on the current iteration and best fitness. The experimental result shows that better performance of new algorithm by using different data sets.
A new-style clustering algorithm based on swarm intelligent theory
Institute of Scientific and Technical Information of China (English)
CHEN Zhuo; LIU Xiang-shuang
2007-01-01
Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this paper, a FSM and a mathematic model of a new-style clustering algorithm based on the swarm intelligence are provided. In this algorithm, the clustering main body moves in a three-dimensional space and has the abilities of memory, communication, analysis, judgment and coordinating information. Experimental results conform that this algorithm has many merits such as insensitive to the order of the data, capable of dealing with exceptional,high-dimension or complicated data. The algorithm can be used in the fields of Web mining, incremental clustering, economic analysis, pattern recognition, document classification and so on.
Ackerman, Margareta; Branzei, Simina; Loker, David
2011-01-01
In this paper we investigate clustering in the weighted setting, in which every data point is assigned a real valued weight. We conduct a theoretical analysis on the influence of weighted data on standard clustering algorithms in each of the partitional and hierarchical settings, characterising the precise conditions under which such algorithms react to weights, and classifying clustering methods into three broad categories: weight-responsive, weight-considering, and weight-robust. Our analysis raises several interesting questions and can be directly mapped to the classical unweighted setting.
Directory of Open Access Journals (Sweden)
Noha Negm
2013-06-01
Full Text Available Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT. It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
HYBRID APPROACH FOR OPTIMAL CLUSTER HEAD SELECTION IN WSN USING LEACH AND MONKEY SEARCH ALGORITHMS
Directory of Open Access Journals (Sweden)
T. SHANKAR
2017-02-01
Full Text Available Wireless Sensor Networks (WSNs are being widely used with low-cost, lowpower, multifunction sensors based on the development of wireless communication, which has enabled a wide variety of new applications. In WSN, the main concern is that it contains a limited power battery and is constrained in energy consumption hence energy and lifetime are of paramount importance. To achieve high energy efficiency and prolong network lifetime in WSNs, clustering techniques have been widely adopted. The proposed algorithm is hybridization of well-known Low-Energy Adaptive Clustering Hierarchy (LEACH algorithm with a distinctive Monkey Search (MS algorithm, which is an optimization algorithm used for optimal cluster head selection. The proposed hybrid algorithm exhibit high throughput, residual energy and improved lifetime. Comparison of the proposed hybrid algorithm is made with the well-known cluster-based protocols for WSNs, namely, LEACH and monkey search algorithm, individually.
Institute of Scientific and Technical Information of China (English)
齐华; 马岚; 刘军
2015-01-01
To solve the problems of traditional water resources monitoring systems ’ monitoring cycle being long,degree of automation being low, a water resources monitoring system for the urban areas based on WSN is proposed in this paper.The LEECH algorithm of the existing WSN hierarchi-cal topology control algorithm’s cluster head nodes in the network are randomly distribution,so it is easy for imbalanced communication distance be-tween cluster member nodes and cluster heads, cluster head node and network gateway nodes. This study is an effort to solve this problem. Through the Matlab software,the performance of the algorithm is simulated,and the results show that the improved algorithm LEACH E can better balance node energy consumption.On the basis of stable network connectivity,the survival time of the network monitoring,compared with LEACH algorithm,is enhanced by about 26%, thereby saving the operation costs of the monitoring net-work.%针对传统的水资源监测系统监测周期长、自动化程度低的问题，提出了一种基于 WSN 的城市水资源监测系统。对已有的 WSN 层次拓扑控制的LEACH 算法，由于簇头节点在网络内的分布是随机的，容易造成簇成员节点与簇头间、簇头节点与网关节点间的通信距离不平衡，故对此不足进行了改进。通过 Matlab 仿真软件对算法性能进行仿真，结果表明该改进算法 LEACH E 能够更好地均衡节点的能量消耗，在保证网络连通性的前提下，监测网络的生存时间比 LEACH 算法提升了约26％，从而节约了监测网络的运行成本。
A Novel Distributed Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Sahar Adabi
2008-01-01
Full Text Available This paper proposed a new Distributed Score Based Clustering Algorithm (DSBCA for Mobile Ad-hoc Networks (MANETs.In MANETs, select suitable nodes in clusters as cluster heads are so important. The proposed Clustering Algorithm considers the Battery Remaining, Number of Neighbors, Number of Members, and Stability in order to calculate the node's score with a linear algorithm. After each node calculates its score independently, the neighbors of the node must be notified about it. Also each node selects one of its neighbors with the highest score to be its cluster head and, therefore the selection of cluster heads is performed in a distributed manner with most recent information about current status of neighbor nodes. The proposed algorithm was compared with Weighted Clustering Algorithm and Distributed Weighted Clustering Algorithm in terms of number of clusters, number of re-affiliations, lifespan of nodes in the system, end-to-end throughput and overhead. The simulation results proved that the proposed algorithm has achieved the goals.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
A Cluster Algorithm for the 2-D SU(3) × SU(3) Chiral Model
Ji, Da-ren; Zhang, Jian-bo
1996-07-01
To extend the cluster algorithm to SU(N) × SU(N) chiral models, a variant version of Wolff's cluster algorithm is proposed and tested for the 2-dimensional SU(3) × SU(3) chiral model. The results show that the new method can reduce the critical slowing down in SU(3) × SU(3) chiral model.
Lowest-ID with Adaptive ID Reassignment: A Novel Mobile Ad-Hoc Networks Clustering Algorithm
Gavalas, Damianos; Konstantopoulos, Charalampos; Mamalis, Basilis
2011-01-01
Clustering is a promising approach for building hierarchies and simplifying the routing process in mobile ad-hoc network environments. The main objective of clustering is to identify suitable node representatives, i.e. cluster heads (CHs), to store routing and topology information and maximize clusters stability. Traditional clustering algorithms suggest CH election exclusively based on node IDs or location information and involve frequent broadcasting of control packets, even when network topology remains unchanged. More recent works take into account additional metrics (such as energy and mobility) and optimize initial clustering. However, in many situations (e.g. in relatively static topologies) re-clustering procedure is hardly ever invoked; hence initially elected CHs soon reach battery exhaustion. Herein, we introduce an efficient distributed clustering algorithm that uses both mobility and energy metrics to provide stable cluster formations. CHs are initially elected based on the time and cost-efficien...
Generation of hierarchically correlated multivariate symbolic sequences
Tumminello, Mi; Mantegna, R N
2008-01-01
We introduce an algorithm to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering procedure as applied to an empirical matrix of Hamming distances. The algorithm can be interpreted as the finite alphabet equivalent of the recently introduced hierarchically nested factor model (M. Tumminello et al. EPL 78 (3) 30006 (2007)). The algorithm is based on a generating mechanism that is different from the one used in the mutation rate approach. We apply the proposed methodology for investigating the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny.
BoCluSt: Bootstrap Clustering Stability Algorithm for Community Detection.
Garcia, Carlos
2016-01-01
The identification of modules or communities in sets of related variables is a key step in the analysis and modeling of biological systems. Procedures for this identification are usually designed to allow fast analyses of very large datasets and may produce suboptimal results when these sets are of a small to moderate size. This article introduces BoCluSt, a new, somewhat more computationally intensive, community detection procedure that is based on combining a clustering algorithm with a measure of stability under bootstrap resampling. Both computer simulation and analyses of experimental data showed that BoCluSt can outperform current procedures in the identification of multiple modules in data sets with a moderate number of variables. In addition, the procedure provides users with a null distribution of results to evaluate the support for the existence of community structure in the data. BoCluSt takes individual measures for a set of variables as input, and may be a valuable and robust exploratory tool of network analysis, as it provides 1) an estimation of the best partition of variables into modules, 2) a measure of the support for the existence of modular structures, and 3) an overall description of the whole structure, which may reveal hierarchical modular situations, in which modules are composed of smaller sub-modules.
Combinatorial Clustering Algorithm of Quantum-Behaved Particle Swarm Optimization and Cloud Model
Directory of Open Access Journals (Sweden)
Mi-Yuan Shan
2013-01-01
Full Text Available We propose a combinatorial clustering algorithm of cloud model and quantum-behaved particle swarm optimization (COCQPSO to solve the stochastic problem. The algorithm employs a novel probability model as well as a permutation-based local search method. We are setting the parameters of COCQPSO based on the design of experiment. In the comprehensive computational study, we scrutinize the performance of COCQPSO on a set of widely used benchmark instances. By benchmarking combinatorial clustering algorithm with state-of-the-art algorithms, we can show that its performance compares very favorably. The fuzzy combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization (FCOCQPSO in vague sets (IVSs is more expressive than the other fuzzy sets. Finally, numerical examples show the clustering effectiveness of COCQPSO and FCOCQPSO clustering algorithms which are extremely remarkable.