Functional Clustering Algorithm for High-Dimensional Proteomics Data
Directory of Open Access Journals (Sweden)
Halima Bensmail
2005-01-01
Full Text Available Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is largely smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of features of variables (here the number of peaks is needed. An innovative hierarchical clustering algorithm may be a good approach. We propose here a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. We present a specific application of functional data analysis (FDA to a high-throughput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and human T-cell leukemia virus type 1 (HTLV-1-infected patients samples.
A functional clustering algorithm for the analysis of neural relationships
Feldt, S; Hetrick, V L; Berke, J D; Zochowski, M
2008-01-01
We formulate a novel technique for the detection of functional clusters in neural data. In contrast to prior network clustering algorithms, our procedure progressively combines spike trains and derives the optimal clustering cutoff in a simple and intuitive manner. To demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. We observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Functional clustering algorithm for the analysis of dynamic network data
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Directory of Open Access Journals (Sweden)
Nan Lin
Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.
Partitional clustering algorithms
2015-01-01
This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...
Parallel Wolff Cluster Algorithms
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
Kernel Generalized Noise Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
WU Xiao-hong; ZHOU Jian-jiang
2007-01-01
To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and the GNC model which are based on Euclidean distance, the presented model is based on kernel-induced distance by using kernel method. By kernel method the input data are nonlinearly and implicitly mapped into a high-dimensional feature space, where the nonlinear pattern appears linear and the GNC algorithm is performed. It is unnecessary to calculate in high-dimensional feature space because the kernel function can do itjust in input space. The effectiveness of the proposed algorithm is verified by experiments on three data sets. It is concluded that the KGNC algorithm has better clustering accuracy than FCM and GNC in clustering data sets containing noisy data.
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.
Recovery Rate of Clustering Algorithms
Li, Fajie; Klette, Reinhard; Wada, T; Huang, F; Lin, S
2009-01-01
This article provides a simple and general way for defining the recovery rate of clustering algorithms using a given family of old clusters for evaluating the performance of the algorithm when calculating a family of new clusters. Under the assumption of dealing with simulated data (i.e., known old
Energy Technology Data Exchange (ETDEWEB)
Yin, Jiandong; Yang, Jiawen; Guo, Qiyong [Shengjing Hospital of China Medical University, Department of Radiology, Shenyang (China)
2015-05-01
Arterial input function (AIF) plays an important role in the quantification of cerebral hemodynamics. The purpose of this study was to select the best reproducible clustering method for AIF detection by comparing three algorithms reported previously in terms of detection accuracy and computational complexity. First, three reproducible clustering methods, normalized cut (Ncut), hierarchy (HIER), and fast affine propagation (FastAP), were applied independently to simulated data which contained the true AIF. Next, a clinical verification was performed where 42 subjects participated in dynamic susceptibility contrast MRI (DSC-MRI) scanning. The manual AIF and AIFs based on the different algorithms were obtained. The performance of each algorithm was evaluated based on shape parameters of the estimated AIFs and the true or manual AIF. Moreover, the execution time of each algorithm was recorded to determine the algorithm that operated more rapidly in clinical practice. In terms of the detection accuracy, Ncut and HIER method produced similar AIF detection results, which were closer to the expected AIF and more accurate than those obtained using FastAP method; in terms of the computational efficiency, the Ncut method required the shortest execution time. Ncut clustering appears promising because it facilitates the automatic and robust determination of AIF with high accuracy and efficiency. (orig.)
DEFF Research Database (Denmark)
Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter
2012-01-01
We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations...... with results from the recently reported [P. Seidler, M. B. Hansen, W. Györffy, D. Toffoli, and O. Christiansen, J. Chem. Phys. 132, 164105 (2010)] vibrational configuration interaction damped response function calculated using a symmetric Lanczos algorithm. Calculations of IR spectra of oxazole, cyclopropene...
The Georgi Algorithms of Jet Clustering
Ge, Shao-Feng
2014-01-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$ in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is p...
Data clustering algorithms and applications
Aggarwal, Charu C
2013-01-01
Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea
Cluster Synchronization Algorithms
Xia, Weiguo; Cao, Ming
2010-01-01
This paper presents two approaches to achieving cluster synchronization in dynamical multi-agent systems. In contrast to the widely studied synchronization behavior, where all the coupled agents converge to the same value asymptotically, in the cluster synchronization problem studied in this paper,
DEFF Research Database (Denmark)
Thomsen, Bo; Hansen, Mikkel Bo; Seidler, Peter
2012-01-01
We report the theory and implementation of vibrational coupled cluster (VCC) damped response functions. From the imaginary part of the damped VCC response function the absorption as function of frequency can be obtained, requiring formally the solution of the now complex VCC response equations....... The absorption spectrum can in this formulation be seen as a matrix function of the characteristic VCC Jacobian response matrix. The asymmetric matrix version of the Lanczos method is used to generate a tridiagonal representation of the VCC response Jacobian. Solving the complex response equations...... in the relevant Lanczos space provides a method for calculating the VCC damped response functions and thereby subsequently the absorption spectra. The convergence behaviour of the algorithm is discussed theoretically and tested for different levels of completeness of the VCC expansion. Comparison is made...
Intuitionistic fuzzy hierarchical clustering algorithms
Institute of Scientific and Technical Information of China (English)
Xu Zeshui
2009-01-01
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a mem-bership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clus-tering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
Issues Challenges and Tools of Clustering Algorithms
Directory of Open Access Journals (Sweden)
Parul Agarwal
2011-05-01
Full Text Available Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real when clustering algorithms are implemented .It also considers the most extensively used tools which are readily available and support functions which ease the programming. Once algorithms have been implemented, they also need to be tested for its validity. There exist several validation indexes for testing the performance and accuracy which have also been discussed here.
Extended Fuzzy Clustering Algorithms
U. Kaymak (Uzay); M. Setnes
2000-01-01
textabstractFuzzy clustering is a widely applied method for obtaining fuzzy models from data. It has been applied successfully in various fields including finance and marketing. Despite the successful applications, there are a number of issues that must be dealt with in practical applications of fuz
Polyclonal clustering algorithm and its convergence
Institute of Scientific and Technical Information of China (English)
MA Li; JIAO Li-cheng; BAI Lin; CHEN Chang-guo
2008-01-01
Being characteristic of non-teacher learning, self-organization, memory, and noise resistance, the artificial immune system is a research focus in the field of intelligent information processing. Based on the basic principles of organism immune and clonal selection, this article presents a polyclonal clustering algorithm characteristic of self-adaptation. According to the core idea of the algorithm, various immune operators in the artificial immune system are employed in the clustering process; moreover, clustering numbers are adjusted in accordance with the affinity function. Introduction of the recombination operator can effectively enhance the diversity of the individual antibody in a generation population, so that the searching scope for solutions is enlarged and the premature phenomenon of the algorithm is avoided. Besides, introduction of the inconsistent mutation operator enhances the adaptability and optimizes the performance of local solution seeking. Meanwhile, the convergence of the algorithm is accelerated. In addition, the article also proves the convergence of the algorithm by employing the Markov chain. Results of the data simulation experiment show that the algorithm is capable of obtaining reasonable and effective cluster.
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
Institute of Scientific and Technical Information of China (English)
Ohn Mar San; Van-Nam Huynh; Yoshiteru Nakamori
2003-01-01
Most of the earlier work on clustering mainly focused on numeric data whose inherent geometric properties can be exploited to naturally define distance functions between data points. However, data mining applications frequently involve many datasets that also consists of mixed numeric and categorical attributes. In this paper we present a clustering algorithm which is based on the k-means algorithm. The algorithm clusters objects with numeric and categorical attributes in a way similar to k-means. The object similarity measure is derived from both numeric and categorical attributes. When applied to numeric data, the algorithm is identical to the k-means. The main result of this paper is to provide a method to update the "cluster centers" of clustering objects described by mixed numeric and categorical attributes in the clustering process to minimize the clustering cost function. The clustering performance of the algorithm is demonstrated with the two well known data sets, namely credit approval and abalone databases.
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Determination of atomic cluster structure with cluster fusion algorithm
DEFF Research Database (Denmark)
Obolensky, Oleg I.; Solov'yov, Ilia; Solov'yov, Andrey V.
2005-01-01
We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters.......We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters....
Particle identification using clustering algorithms
Wirth, R; Löher, B; Savran, D; Silva, J; Pol, H Álvarez; Gil, D Cortina; Pietras, B; Bloch, T; Kröll, T; Nácher, E; Perea, Á; Tengblad, O; Bendel, M; Dierigl, M; Gernhäuser, R; Bleis, T Le; Winkel, M
2013-01-01
A method that uses fuzzy clustering algorithms to achieve particle identification based on pulse shape analysis is presented. The fuzzy c-means clustering algorithm is used to compute mean (principal) pulse shapes induced by different particle species in an automatic and unsupervised fashion from a mixed set of data. A discrimination amplitude is proposed using these principal pulse shapes to identify the originating particle species of a detector pulse. Since this method does not make any assumptions about the specific features of the pulse shapes, it is very generic and suitable for multiple types of detectors. The method is applied to discriminate between photon- and proton-induced signals in CsI(Tl) scintillator detectors and the results are compared to the well-known integration method.
An Improved Weighted Clustering Algorithm in MANET
Institute of Scientific and Technical Information of China (English)
WANG Jin; XU Li; ZHENG Bao-yu
2004-01-01
The original clustering algorithms in Mobile Ad hoc Network (MANET) are firstly analyzed in this paper.Based on which, an Improved Weighted Clustering Algorithm (IWCA) is proposed. Then, the principle and steps of our algorithm are explained in detail, and a comparison is made between the original algorithms and our improved method in the aspects of average cluster number, topology stability, clusterhead load balance and network lifetime. The experimental results show that our improved algorithm has the best performance on average.
Kernel method-based fuzzy clustering algorithm
Institute of Scientific and Technical Information of China (English)
Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping
2005-01-01
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
A new cluster algorithm for graphs
Dongen, S. van
1998-01-01
A new cluster algorithm for graphs called the emph{Markov Cluster algorithm ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with
A new efficient Cluster Algorithm for the Ising Model
Nyffeler, M; Wiese, U J; Nyfeler, Matthias; Pepe, Michele; Wiese, Uwe-Jens
2005-01-01
Using D-theory we construct a new efficient cluster algorithm for the Ising model. The construction is very different from the standard Swendsen-Wang algorithm and related to worm algorithms. With the new algorithm we have measured the correlation function with high precision over a surprisingly large number of orders of magnitude.
Counterexamples to convergence theorem of maximum-entropy clustering algorithm
Institute of Scientific and Technical Information of China (English)
于剑; 石洪波; 黄厚宽; 孙喜晨; 程乾生
2003-01-01
In this paper, we surveyed the development of maximum-entropy clustering algorithm, pointed out that the maximum-entropy clustering algorithm is not new in essence, and constructed two examples to show that the iterative sequence given by the maximum-entropy clustering algorithm may not converge to a local minimum of its objective function, but a saddle point. Based on these results, our paper shows that the convergence theorem of maximum-entropy clustering algorithm put forward by Kenneth Rose et al. does not hold in general cases.
A Load Balance Routing Algorithm Based on Uneven Clustering
Directory of Open Access Journals (Sweden)
Liang Yuan
2013-10-01
Full Text Available Aiming at the problem of uneven load in clustering Wireless Sensor Network (WSN, a kind of load balance routing algorithm based on uneven clustering is proposed to do uneven clustering and calculate optimal number of clustering. This algorithm prevents the number of common node under some certain cluster head from being too large which leads load to be overweight to death through even node clustering. It constructs evaluation function which can better reflect residual energy distribution of nodes and at the same time constructs routing evaluation function between cluster heads which uses MATLAB to do simulation on the performance of this algorithm. Simulation result shows that the routing established by this algorithm effectively improves network’s energy balance and lengthens the life cycle of network.
Baldauf, Tobias; Seljak, Uros; Mandelbaum, Rachel
2009-01-01
The clustering of matter on cosmological scales is an essential probe for studying the physical origin and composition of our Universe. To date, most of the direct studies have focused on shear-shear weak lensing correlations, but it is also possible to extract the dark matter clustering by combining galaxy-clustering and galaxy-galaxy-lensing measurements. In this study we develop a method that can constrain the dark matter correlation function from galaxy clustering and galaxy-galaxy-lensing measurements, by focusing on the correlation coefficient between the galaxy and matter overdensity fields. To generate a mock galaxy catalogue for testing purposes, we use the Halo Occupation Distribution approach applied to a large ensemble of N-body simulations to model pre-existing SDSS Luminous Red Galaxy sample observations. Using this mock catalogue, we show that a direct comparison between the excess surface mass density measured by lensing and its corresponding galaxy clustering quantity is not optimal. We devel...
Frequent Pattern Mining Algorithms for Data Clustering
DEFF Research Database (Denmark)
Zimek, Arthur; Assent, Ira; Vreeken, Jilles
2014-01-01
that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...
Introduction to Cluster Monte Carlo Algorithms
Luijten, E.
This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster algorithm of Dress and Krauth is described. It is shown how their geometric approach can be generalized to incorporate particle interactions beyond hardcore repulsions, thus forging a connection between the lattice and continuum approaches. Several illustrative examples are discussed.
A Novel Research on Rough Clustering Algorithm
Directory of Open Access Journals (Sweden)
Tao Qu
2014-01-01
Full Text Available The aim of this study is focusing the issue of traditional clustering algorithm subjects to data space distribution influence, a novel clustering algortihm combined with rough set theory is employed to the normal clustering. The proposed rough clustering algorithm takes the condition attributes and decision attributes displayed in the information table as the consistency principle, meanwhile it takes the data supercubic and information entropy to realize data attribute shortcutting and discretizing. Based on above discussion, by applying assemble feature vector addition principle computiation only one scanning information table can realize clustering for the data subject. Experiments reveal that the proposed algorithm is efficient and feasible.
Mercer Kernel Based Fuzzy Clustering Self-Adaptive Algorithm
Institute of Scientific and Technical Information of China (English)
李侃; 刘玉树
2004-01-01
A novel mercer kernel based fuzzy clustering self-adaptive algorithm is presented. The mercer kernel method is introduced to the fuzzy c-means clustering. It may map implicitly the input data into the high-dimensional feature space through the nonlinear transformation. Among other fuzzy c-means and its variants, the number of clusters is first determined. A self-adaptive algorithm is proposed. The number of clusters, which is not given in advance, can be gotten automatically by a validity measure function. Finally, experiments are given to show better performance with the method of kernel based fuzzy c-means self-adaptive algorithm.
Hesitant fuzzy agglomerative hierarchical clustering algorithms
Zhang, Xiaolu; Xu, Zeshui
2015-02-01
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.
Chul Hyoung Lyoo; Paolo Zanotti-Fregonara; Zoghbi, Sami S.; Jeih-San Liow; Rong Xu; Pike, Victor W.; Zarate, Carlos A.; Masahiro Fujita; Innis, Robert B.
2014-01-01
Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [(11)C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe...
Study of the Artificial Fish Swarm Algorithm for Hybrid Clustering
Directory of Open Access Journals (Sweden)
Hongwei Zhao
2015-06-01
Full Text Available The basic Artificial Fish Swarm (AFS Algorithm is a new type of an heuristic swarm intelligence algorithm, but it is difficult to optimize to get high precision due to the randomness of the artificial fish behavior, which belongs to the intelligence algorithm. This paper presents an extended AFS algorithm, namely the Cooperative Artificial Fish Swarm (CAFS, which significantly improves the original AFS in solving complex optimization problems. K-medoids clustering algorithm is being used to classify data, but the approach is sensitive to the initial selection of the centers with low quality of the divided cluster. A novel hybrid clustering method based on the CAFS and K-medoids could be used for solving clustering problems. In this work, first, CAFS algorithm is used for optimizing six widely-used benchmark functions, coming up with comparative results produced by AFS and CAFS, then Particle Swarm Optimization (PSO is studied. Second, the hybrid algorithm with K-medoids and CAFS algorithms is used for data clustering on several benchmark data sets. The performance of the hybrid algorithm based on K-medoids and CAFS is compared with AFS and CAFS algorithms on a clustering problem. The simulation results show that the proposed CAFS outperforms the other two algorithms in terms of accuracy and robustness.
Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms
Directory of Open Access Journals (Sweden)
Arindam Chaudhuri
2015-01-01
Full Text Available Intuitionistic fuzzy sets (IFSs provide mathematical framework based on fuzzy sets to describe vagueness in data. It finds interesting and promising applications in different domains. Here, we develop an intuitionistic fuzzy possibilistic C means (IFPCM algorithm to cluster IFSs by hybridizing concepts of FPCM, IFSs, and distance measures. IFPCM resolves inherent problems encountered with information regarding membership values of objects to each cluster by generalizing membership and nonmembership with hesitancy degree. The algorithm is extended for clustering interval valued intuitionistic fuzzy sets (IVIFSs leading to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM. The clustering algorithm has membership and nonmembership degrees as intervals. Information regarding membership and typicality degrees of samples to all clusters is given by algorithm. The experiments are performed on both real and simulated datasets. It generates valuable information and produces overlapped clusters with different membership degrees. It takes into account inherent uncertainty in information captured by IFSs. Some advantages of algorithms are simplicity, flexibility, and low computational complexity. The algorithm is evaluated through cluster validity measures. The clustering accuracy of algorithm is investigated by classification datasets with labeled patterns. The algorithm maintains appreciable performance compared to other methods in terms of pureness ratio.
Simulated annealing spectral clustering algorithm for image segmentation
Institute of Scientific and Technical Information of China (English)
Yifang Yang; and Yuping Wang
2014-01-01
The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity mea-sure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid conver-gence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently ap-ply the algorithm to image segmentation, the Nystr¨om method is used to reduce the computation complexity. Experimental re-sults show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.
Algorithm for Spatial Clustering with Obstacles
El-Sharkawi, Mohamed E
2009-01-01
In this paper, we propose an efficient clustering technique to solve the problem of clustering in the presence of obstacles. The proposed algorithm divides the spatial area into rectangular cells. Each cell is associated with statistical information that enables us to label the cell as dense or non-dense. We also label each cell as obstructed (i.e. intersects any obstacle) or non-obstructed. Then the algorithm finds the regions (clusters) of connected, dense, non-obstructed cells. Finally, the algorithm finds a center for each such region and returns those centers as centers of the relatively dense regions (clusters) in the spatial area.
A new fusion algorithm for fuzzy clustering
Directory of Open Access Journals (Sweden)
Ivan Vidović
2014-12-01
Full Text Available In this paper, we have considered the merging problem of two ellipsoidal clusters in order to construct a new fusion algorithm for fuzzy clustering. We have proposed a criterion for merging two ellipsoidal clusters ∏1, ∏2 with associated main Mahalanobis circles Ej(cj,σj, where cj is the centroid and σ^2j is the Mahalanobis variance of cluster ∏j . Based on the well-known Davies-Bouldin index, we have constructed a new fusion algorithm. The criterion has been tested on several data sets, and the performance of the fusion algorithm has been demonstrated on an illustrative example.
Novel Cluster Validity Index for FCM Algorithm
Institute of Scientific and Technical Information of China (English)
Jian Yu; Cui-Xia Li
2006-01-01
How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality test of FCM. Unlike the previous cluster validity indices, this novel cluster validity index is inherent in FCM itself. Comparison experiments show that the stability index can be used as cluster validity index for the fuzzy c-means.
The Georgi algorithms of jet clustering
Ge, Shao-Feng
2015-05-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to J {/β ( n)} with a jet function index n in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is process-independent and hence logically consistent, for softer subjets the inclusion cone is larger to conform with the fact that parton shower tends to emit softer partons at earlier stage with larger opening angle, and that the cone size cannot be too large in order to avoid mixing up neighbor jets, we derive constraints on the jet function parameter β and index n which are closely related to cone size cutoff. Finally, we discuss how jet function values can be made invariant under Lorentz boost.
An object-oriented cluster search algorithm
Energy Technology Data Exchange (ETDEWEB)
Silin, Dmitry; Patzek, Tad
2003-01-24
In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.
Directory of Open Access Journals (Sweden)
Jiang Ting
2010-01-01
Full Text Available We optimize the cluster structure to solve problems such as the uneven energy consumption of the radar sensor nodes and random cluster head selection in the traditional clustering routing algorithm. According to the defined cost function for clusters, we present the clustering algorithm which is based on radio-free space path loss. In addition, we propose the energy and distance pheromones based on the residual energy and aggregation of the radar sensor nodes. According to bionic heuristic algorithm, a new ant colony-based clustering algorithm for radar sensor networks is also proposed. Simulation results show that this algorithm can get a better balance of the energy consumption and then remarkably prolong the lifetime of the radar sensor network.
An extended EM algorithm for subspace clustering
Institute of Scientific and Technical Information of China (English)
Lifei CHEN; Qingshan JIANG
2008-01-01
Clustering high dimensional data has become a challenge in data mining due to the curse of dimension-ality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calcu-lates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weight-ing method, as a means to minimize a proposed cluster-ing criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the pos-sible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.
基于聚类准则函数的改进K-means算法%Improved K-means algorithm based on clustering criterion function
Institute of Scientific and Technical Information of China (English)
张雪凤; 张桂珍; 刘鹏
2011-01-01
The criterion function used in K-meaus algorithm is the sum of the squared error,which may not work well for dataset containing clusters with different sizes and densities.In this study,the criterion function is improved by being defined as the sum of the weighted standard deviation, and the weight is the ratio of the number of points in each cluster to the whole points.The way each point being assigned to the centroid in the K-means algorithm is also modified:Iustead of being assigned to the closest centroid, each point is assigned to the centroid which has minimum weighted distance.Experiments on simulation datasets show that the improved K-means algorithm significantly enhances the clustering quality by reducing the probability of misclassifying the points of big sparse clusters to its neighboring compact clusters.Experiments on UCI datasets show that the improved algorithm can obtain more compact cluster.Therefore,the improved K-means algorithm is effective.%K-means算法所使用的聚类准则函数是将数据集中各个簇的误差平方值直接相加而得到的,不能有效处理簇的密度不均且大小差异较大的数据集.为此,将K-means算法的聚类准则函数定义为加权的簇内标准差之和,权重为簇内数据对象数占总数目的比例.同时,调整了传统K-means算法将数据对象重新分配给簇的方法,采用一个数据对象到中心点的加权距离代替传统K-means算法中的距离,将数据对象分配给使加权距离最小的中心点所在的簇.实验结果表明,针对模拟数据集的聚类,改进K-means算法可以明显减少大而稀的簇中数据对象被错误地分配到相邻的小而密簇的可能性,改善了聚类的质量;针对UCI数据集的聚类,改进算法使得各个簇更为紧凑,从而验证了改进K-means算法的有效性.
Data clustering theory, algorithms, and applications
Gan, Guojun; Wu, Jianhong
2007-01-01
Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.
Load Balancing Algorithm for Cache Cluster
Institute of Scientific and Technical Information of China (English)
刘美华; 古志民; 曹元大
2003-01-01
By the load definition of cluster, the request is regarded as granularity to compute load and implement the load balancing in cache cluster. First, the processing power of cache-node is studied from four aspects: network bandwidth, memory capacity, disk access rate and CPU usage. Then, the weighted load of cache-node is customized. Based on this, a load-balancing algorithm that can be applied to the cache cluster is proposed. Finally, Polygraph is used as a benchmarking tool to test the cache cluster possessing the load-balancing algorithm and the cache cluster with cache array routing protocol respectively. The results show the load-balancing algorithm can improve the performance of the cache cluster.
Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
Directory of Open Access Journals (Sweden)
MUHAMMAD WASEEM KHAN
2017-01-01
Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.
The Effective Clustering Partition Algorithm Based on the Genetic Evolution
Institute of Scientific and Technical Information of China (English)
LIAO Qin; LI Xi-wen
2006-01-01
To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.
Distance function selection in several clustering algorithrms
Institute of Scientific and Technical Information of China (English)
LU Yu
2004-01-01
Most clustering algorithms need to describe the similarity of objects by a predefined distance function. Three distance functions which are widely used in two traditional clustering algorithms k-means and hierarchical clustering were investigated.Both theoretical analysis and detailed experimental results were given. It is shown that a distance function greatly affects clustering results and can be used to detect the outlier of a cluster by the comparison of such different results and give the shape information of clusters. In practice situation, it is suggested to use different distance function separately, compare the clustering results and pick out the "swing points". And such points may leak out more information for data analysts.
Self-organization and clustering algorithms
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
Non-convex polygons clustering algorithm
Directory of Open Access Journals (Sweden)
Kruglikov Alexey
2016-01-01
Full Text Available A clustering algorithm is proposed, to be used as a preliminary step in motion planning. It is tightly coupled to the applied problem statement, i.e. uses parameters meaningful only with respect to it. Use of geometrical properties for polygons clustering allows for a better calculation time as opposed to general-purpose algorithms. A special form of map optimized for quick motion planning is constructed as a result.
Pixel Intensity Clustering Algorithm for Multilevel Image Segmentation
Directory of Open Access Journals (Sweden)
Oludayo O. Olugbara
2015-01-01
Full Text Available Image segmentation is an important problem that has received significant attention in the literature. Over the last few decades, a lot of algorithms were developed to solve image segmentation problem; prominent amongst these are the thresholding algorithms. However, the computational time complexity of thresholding exponentially increases with increasing number of desired thresholds. A wealth of alternative algorithms, notably those based on particle swarm optimization and evolutionary metaheuristics, were proposed to tackle the intrinsic challenges of thresholding. In codicil, clustering based algorithms were developed as multidimensional extensions of thresholding. While these algorithms have demonstrated successful results for fewer thresholds, their computational costs for a large number of thresholds are still a limiting factor. We propose a new clustering algorithm based on linear partitioning of the pixel intensity set and between-cluster variance criterion function for multilevel image segmentation. The results of testing the proposed algorithm on real images from Berkeley Segmentation Dataset and Benchmark show that the algorithm is comparable with state-of-the-art multilevel segmentation algorithms and consistently produces high quality results. The attractive properties of the algorithm are its simplicity, generalization to a large number of clusters, and computational cost effectiveness.
A High-Order CFS Algorithm for Clustering Big Data
Directory of Open Access Journals (Sweden)
Fanyu Bu
2016-01-01
Full Text Available With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learning model, whose functionality rests on three pillars: (i an adaptive dropout deep learning model to learn features from each type of data, (ii a feature tensor model to capture the correlations of heterogeneous data, and (iii a tensor distance-based high-order CFS algorithm to cluster heterogeneous data. Furthermore, we verify our proposed algorithm on different datasets, by comparison with other two clustering schemes, that is, HOPCM and CFS. Results confirm the effectiveness of the proposed algorithm in clustering heterogeneous data.
Optimal Hops-Based Adaptive Clustering Algorithm
Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong
This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.
Blockspin Cluster Algorithms for Quantum Spin Systems
Wiese, U J
1992-01-01
Cluster algorithms are developed for simulating quantum spin systems like the one- and two-dimensional Heisenberg ferro- and anti-ferromagnets. The corresponding two- and three-dimensional classical spin models with four-spin couplings are maped to blockspin models with two-blockspin interactions. Clusters of blockspins are updated collectively. The efficiency of the method is investigated in detail for one-dimensional spin chains. Then in most cases the new algorithms solve the problems of slowing down from which standard algorithms are suffering.
A New Clustering Algorithm for Face Classification
Directory of Open Access Journals (Sweden)
Shaker K. Ali
2016-06-01
Full Text Available In This paper, we proposed new clustering algorithm depend on other clustering algorithm ideas. The proposed algorithm idea is based on getting distance matrix, then the exclusion of the matrix points which will be clustered by saving the location (row, column of these points and determine the minimum distance of these points which will be belongs the group (class and keep the other points which are not clustering yet. The propose algorithm is applied to image data base of the human face with different environment (direction, angles... etc.. These data are collected from different resource (ORL site and real images collected from random sample of Thi_Qar city population in lraq. Our algorithm has been implemented on three types of distance to calculate the minimum distance between points (Euclidean, Correlation and Minkowski distance .The efficiency ratio of proposed algorithm has varied according to the data base and threshold, the efficiency of our algorithm is exceeded (96%. Matlab (2014 has been used in this work.
A Survey of Grid Based Clustering Algorithms
Directory of Open Access Journals (Sweden)
MR ILANGO
2010-08-01
Full Text Available Cluster Analysis, an automatic process to find similar objects from a database, is a fundamental operation in data mining. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering techniques have been discussed extensively in SimilaritySearch, Segmentation, Statistics, Machine Learning, Trend Analysis, Pattern Recognition and Classification [1]. Clustering methods can be classified into i Partitioning methods ii Hierarchical methods iii Density-based methods iv Grid-based methods v Model-based methods. Grid based methods quantize the object space into a finite number of cells (hyper-rectangles and then perform the required operations on the quantized space. The main advantage of Grid based method is its fast processing time which depends on number of cells in each dimension in quantized space. In this research paper, we present some of the grid based methods such as CLIQUE (CLustering In QUEst [2], STING (STatistical INformation Grid [3], MAFIA (Merging of Adaptive Intervals Approach to Spatial Data Mining [4], Wave Cluster [5]and O-CLUSTER (Orthogonal partitioning CLUSTERing [6], as a survey andalso compare their effectiveness in clustering data objects. We also present some of the latest developments in Grid Based methods such as Axis Shifted Grid Clustering Algorithm [7] and Adaptive Mesh Refinement [Wei-Keng Liao etc] [8] to improve the processing time of objects.
Cluster hybrid Monte Carlo simulation algorithms
Plascak, J. A.; Ferrenberg, Alan M.; Landau, D. P.
2002-06-01
We show that addition of Metropolis single spin flips to the Wolff cluster-flipping Monte Carlo procedure leads to a dramatic increase in performance for the spin-1/2 Ising model. We also show that adding Wolff cluster flipping to the Metropolis or heat bath algorithms in systems where just cluster flipping is not immediately obvious (such as the spin-3/2 Ising model) can substantially reduce the statistical errors of the simulations. A further advantage of these methods is that systematic errors introduced by the use of imperfect random-number generation may be largely healed by hybridizing single spin flips with cluster flipping.
Cluster functional renormalization group
Reuther, Johannes; Thomale, Ronny
2014-01-01
Functional renormalization group (FRG) has become a diverse and powerful tool to derive effective low-energy scattering vertices of interacting many-body systems. Starting from a free expansion point of the action, the flow of the RG parameter Λ allows us to trace the evolution of the effective one- and two-particle vertices towards low energies by taking into account the vertex corrections between all parquet channels in an unbiased fashion. In this work, we generalize the expansion point at which the diagrammatic resummation procedure is initiated from a free UV limit to a cluster product state. We formulate a cluster FRG scheme where the noninteracting building blocks (i.e., decoupled spin clusters) are treated exactly, and the intercluster couplings are addressed via RG. As a benchmark study, we apply our cluster FRG scheme to the spin-1/2 bilayer Heisenberg model (BHM) on a square lattice where the neighboring sites in the two layers form the individual two-site clusters. Comparing with existing numerical evidence for the BHM, we obtain reasonable findings for the spin susceptibility, the spin-triplet excitation energy, and quasiparticle weight even in coupling regimes close to antiferromagnetic order. The concept of cluster FRG promises applications to a large class of interacting electron systems.
An Adaptive Clustering Algorithm for Intrusion Detection
Institute of Scientific and Technical Information of China (English)
QIU Juli
2007-01-01
In this paper,we introduce an adaptive clustering algorithm for intrusion detection based on wavecluster which was introduced by Gholamhosein in 1999 and used with success in image processing.Because of the non-stationary characteristic of network traffic,we extend and develop an adaptive wavecluster algorithm for intrusion detection.Using the multiresolution property of wavelet transforms,we can effectively identify arbitrarily shaped clusters at different scales and degrees of detail,moreover,applying wavelet transform removes the noise from the original feature space and make more accurate cluster found.Experimental results on KDD-99 intrusion detection dataset show the efficiency and accuracy of this algorithm.A detection rate above 96% and a false alarm rate below 3% are achieved.
Efficient Cluster Head Selection Algorithm for MANET
Directory of Open Access Journals (Sweden)
Khalid Hussain
2013-01-01
Full Text Available In mobile ad hoc network (MANET cluster head selection is considered a gigantic challenge. In wireless sensor network LEACH protocol can be used to select cluster head on the bases of energy, but it is still a dispute in mobil ad hoc networks and especially when nodes are itinerant. In this paper we proposed an efficient cluster head selection algorithm (ECHSA, for selection of the cluster head efficiently in Mobile ad hoc networks. We evaluate our proposed algorithm through simulation in OMNet++ as well as on test bed; we experience the result according to our assumption. For further evaluation we also compare our proposed protocol with several other protocols like LEACH-C and consequences show perfection.
Performance Analysis of Hierarchical Clustering Algorithm
Directory of Open Access Journals (Sweden)
K.Ranjini
2011-07-01
Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.
Parallel Clustering Algorithms for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Lyoo, Chul Hyoung; Zanotti-Fregonara, Paolo; Zoghbi, Sami S; Liow, Jeih-San; Xu, Rong; Pike, Victor W; Zarate, Carlos A; Fujita, Masahiro; Innis, Robert B
2014-01-01
Image-derived input function (IDIF) obtained by manually drawing carotid arteries (manual-IDIF) can be reliably used in [(11)C](R)-rolipram positron emission tomography (PET) scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA). To validate this technique, 25 healthy controls and 26 patients with moderate to severe major depressive disorder (MDD) underwent T1-weighted brain magnetic resonance imaging (MRI) and a 90-minute [(11)C](R)-rolipram PET scan. For each subject, metabolite-corrected input function was measured from the radial artery. SVCA templates were obtained from 10 additional healthy subjects who underwent the same MRI and PET procedures. Cluster-IDIF was obtained as follows: 1) template mask images were created for carotid and surrounding tissue; 2) parametric image of weights for blood were created using SVCA; 3) mask images to the individual PET image were inversely normalized; 4) carotid and surrounding tissue time activity curves (TACs) were obtained from weighted and unweighted averages of each voxel activity in each mask, respectively; 5) partial volume effects and radiometabolites were corrected using individual arterial data at four points. Logan-distribution volume (V T/f P) values obtained by cluster-IDIF were similar to reference results obtained using arterial data, as well as those obtained using manual-IDIF; 39 of 51 subjects had a V T/f P error of 10%. With automatic voxel selection, cluster-IDIF curves were less noisy than manual-IDIF and free of operator-related variability. Cluster-IDIF showed widespread decrease of about 20% [(11)C](R)-rolipram binding in the MDD group. Taken together, the results suggest that cluster-IDIF is a good alternative to full arterial input function for estimating Logan-V T/f P in [(11)C](R)-rolipram PET clinical scans. This
Directory of Open Access Journals (Sweden)
Chul Hyoung Lyoo
Full Text Available Image-derived input function (IDIF obtained by manually drawing carotid arteries (manual-IDIF can be reliably used in [(11C](R-rolipram positron emission tomography (PET scans. However, manual-IDIF is time consuming and subject to inter- and intra-operator variability. To overcome this limitation, we developed a fully automated technique for deriving IDIF with a supervised clustering algorithm (SVCA. To validate this technique, 25 healthy controls and 26 patients with moderate to severe major depressive disorder (MDD underwent T1-weighted brain magnetic resonance imaging (MRI and a 90-minute [(11C](R-rolipram PET scan. For each subject, metabolite-corrected input function was measured from the radial artery. SVCA templates were obtained from 10 additional healthy subjects who underwent the same MRI and PET procedures. Cluster-IDIF was obtained as follows: 1 template mask images were created for carotid and surrounding tissue; 2 parametric image of weights for blood were created using SVCA; 3 mask images to the individual PET image were inversely normalized; 4 carotid and surrounding tissue time activity curves (TACs were obtained from weighted and unweighted averages of each voxel activity in each mask, respectively; 5 partial volume effects and radiometabolites were corrected using individual arterial data at four points. Logan-distribution volume (V T/f P values obtained by cluster-IDIF were similar to reference results obtained using arterial data, as well as those obtained using manual-IDIF; 39 of 51 subjects had a V T/f P error of 10%. With automatic voxel selection, cluster-IDIF curves were less noisy than manual-IDIF and free of operator-related variability. Cluster-IDIF showed widespread decrease of about 20% [(11C](R-rolipram binding in the MDD group. Taken together, the results suggest that cluster-IDIF is a good alternative to full arterial input function for estimating Logan-V T/f P in [(11C](R-rolipram PET clinical scans. This
Analysis of Stemming Algorithm for Text Clustering
Directory of Open Access Journals (Sweden)
N.Sandhya
2011-09-01
Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.
High-Performance Broadcasting Algorithms on Cluster
Institute of Scientific and Technical Information of China (English)
舒继武; 魏英霞; 王鼎兴
2004-01-01
In many clusters connected by high-speed communication networks, the exact structure of the underlying communication network and the latency difference between different sending and receiving pairs may be ignored when they broadcast, such as in the approach adopted by the broadcasting method in MPICH,a widely used MPI implementation. However, the underlying network cluster topologies are becoming more and more complicated and the performance of traditional broadcasting algorithms, such as MPICH's MPI_Bcast, is far from good. This paper analyzed the impact of communication latencies and the underlying topologies on the performance of broadcasting algorithms for multilevel clusters. A multilevel model was developed for broadcasting in clusters with complicated topologies, which divides the cluster topology into many levels based on the underlying topology. The multilevel model was used to develop a new broadcast algorithm,MLM broadcast-2 (MLMB-2), that adapts to a wide range of clusters. Comparison of the performance of the counterpart MPI operation MPI_Bcast and MLMB-2 shows that MLMB-2 outperforms MPl_Bcast by decreasing the broadcast running time by 60%-90%.
Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.
We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
Cluster algorithm special purpose processor
Energy Technology Data Exchange (ETDEWEB)
Talapov, A.L.; Shchur, L.N.; Andreichenko, V.B.; Dotsenko, V.S. (Landau Inst. for Theoretical Physics, GSP-1 117940 Moscow V-334 (USSR))
1992-08-10
In this paper, the authors describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
An Improved Heuristic Ant-Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
Yunfei Chen; Yushu Liu; Jihai Zhao
2004-01-01
An improved heuristic ant-clustering algorithm(HAC)is presented in this paper. A device of ＇memory bank＇ is proposed,which can bring forth heuristic knowledge guiding ant to move in the bi-dimension grid space.The device experiments on real data sets and synthetic data sets.The results demonstrate that HAC has superiority in misclassification error rate and runtime over the classical algorithm.
A Novel Hybrid Data Clustering Algorithm Based on Artificial Bee Colony Algorithm and K-Means
Institute of Scientific and Technical Information of China (English)
TRAN Dang Cong; WU Zhijian; WANG Zelin; DENG Changshou
2015-01-01
To improve the performance of K-means clustering algorithm, this paper presents a new hybrid ap-proach of Enhanced artificial bee colony algorithm and K-means (EABCK). In EABCK, the original artificial bee colony algorithm (called ABC) is enhanced by a new mu-tation operation and guided by the global best solution (called EABC). Then, the best solution is updated by K-means in each iteration for data clustering. In the experi-ments, a set of benchmark functions was used to evaluate the performance of EABC with other comparative ABC variants. To evaluate the performance of EABCK on data clustering, eleven benchmark datasets were utilized. The experimental results show that EABC and EABCK out-perform other comparative ABC variants and data clus-tering algorithms, respectively.
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index.
A Fast Algorithm for Support Vector Clustering
Institute of Scientific and Technical Information of China (English)
吕常魁; 姜澄宇; 王宁生
2004-01-01
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model[3] , the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed.The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
Fuzzy Rules for Ant Based Clustering Algorithm
Directory of Open Access Journals (Sweden)
Amira Hamdi
2016-01-01
Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.
Application of a New Fuzzy Clustering Algorithm in Intrusion Detection
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
This paper presents a new Section Set Adaptive FCM algorithm. The algorithm solved the shortcomings of localoptimality, unsure classification and clustering numbers ascertained previously. And it improved on the architecture of FCM al-gorithm, enhanced the analysis for effective clustering. During the clustering processing, it may adjust clustering numbers dy-namically. Finally, it used the method of section set decreasing the time of classification. By experiments, the algorithm can im-prove dependability of clustering and correctness of classification.
Limited Random Walk Algorithm for Big Graph Data Clustering
Zhang, Honglei; Kiranyaz, Serkan; Gabbouj, Moncef
2016-01-01
Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attracting vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrass...
A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.
Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing
2017-01-01
An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.
Parallel FFT Algorithm on Computer Clusters
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
DFT is widely applied in the field of signal process and others. Most present rapid ways of calculation are either based on paralleled computers connected by such particular systems like butterfly network, hypercube etc;or based on the assumption of instant transportation, non-conflict communication, complete connection of paralleled processors and unlimited usable processors. However, the delay of communication in the system of information transmission cannot be ignored. This paper works on the following aspects: instant transmission, dispatching missions, and the path of information through the communication link in the computer cluster systems;layout of the dynamic FFT algorithm under the different structures of computer clusters.
Morphology of open clusters NGC 1857 and Czernik 20 using clustering algorithms
Bhattacharya, S.; Mahulkar, V.; Pandaokar, S.; Singh, P. K.
2017-01-01
The morphology and cluster membership of the Galactic open clusters-Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of the cleaned CMDs to obtain reliable distance and reddening parameters for the clusters (Czernik 20: D = 2900 pc; E(J- K) = 0 . 33; NGC 1857: D = 2400 pc; E(J- K) =0.18-0.19). The isochrones were also used to convert the luminosity functions for the densest regions of Czernik 20 and NGC 1857 into mass function, to derive their slopes. Additionally, a previously unknown over-density consistent with that of a star cluster is identified in the region of analysis.
Comparative study of several Clustering Algorithms
Directory of Open Access Journals (Sweden)
Prof. Neha Soni, Dr. Amit Ganatra
2012-12-01
Full Text Available Cluster Analysis is a process of grouping theobjects, where objects can be physical like a studentor can be an abstract such as behaviour of acustomer or handwriting of a person. The clusteranalysis is as old as a human life and has its rootsin many fields such as statistics, machine learning,biology, artificial intelligence. It is an unsupervisedlearning and faces many challenges such as a highdimension of the dataset, arbitrary shapes ofclusters, scalability, input parameter, domainknowledge and noisy data. Large number ofclustering algorithms had been proposed till date toaddress these challenges. There do not exist a singlealgorithm which can adequately handle all sorts ofrequirement. This makes a great challenge for theuser to do selection among the available algorithmfor the specific task. The purpose of this paper is toprovide a detailed analytical comparison of some ofthe very well known clustering algorithms, whichprovides guidance for the selection of clusteringalgorithm for a specific application.
An incremental clustering algorithm based on Mahalanobis distance
Aik, Lim Eng; Choon, Tan Wee
2014-12-01
Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.
CABOSFV algorithm for high dimensional sparse data clustering
Institute of Scientific and Technical Information of China (English)
Sen Wu; Xuedong Gao
2004-01-01
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV), was proposed for the high dimensional clustering of binary sparse data. This algorithm compresses the data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scale enormously, and can get the clustering result with only one data scan. Both theoretical analysis and empirical tests showed that CABOSFV is of low computational complexity. The algorithm finds clusters in high dimensional large datasets efficiently and handles noise effectively.
First Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.
We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.
Dynamic exponents for potts model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
We have studied the Swendsen-Wang and Wolff cluster update algorithms for the Ising model in 2, 3 and 4 dimensions. The data indicate simple relations between the specific heat and the Wolff autocorrelations, and between the magnetization and the Swendsen-Wang autocorrelations. This implies that the dynamic critical exponents are related to the static exponents of the Ising model. We also investigate the possibility of similar relationships for the Q-state Potts model.
Enhanced Unequal Clustering Algorithm for Wireless Sensor Networks
Talbi, Said; Zaouche, Lotfi
2015-01-01
International audience; Clustering is considered as solution for more energy conservation during communications in wireless sensor networks. Recently, a new clustering algorithm named Unequal Clustering Algorithm (UCA) is proposed to avoid the burdened cluster-heads located around the sink due to the traffic coming from others which are far to the base station. This paper presents an Enhanced Unequal Clustering Algorithm called EUCA. This solution reduces the control traffic during a clusteri...
ITS Cluster Finding Algorithm on GPU
Changaival, Boonyarit
2014-01-01
ITS cluster finding algorithm is one of the data reduction algorithms at ALICE. It needs to be processed fast due to a high amount of data readout from the detector. A variety of platforms were studied for the system design. My work is to design, implement and benchmark this algorithm on a GPU platform. GPU is one of many platform that promote parallel computing. A high-end GPU can contain over 2000 processing cores comparing to the commodity CPUs which have only four cores. The program is written in C and CUDA library. The throughput (Number of events per second) is used as a metric to measure the performance. With the latest implementation, the throughput was increased by a factor of 5.
Hearing the clusters in a graph: A distributed algorithm
Sahai, Tuhin; Banaszuk, Andrzej
2009-01-01
We propose a novel distributed algorithm to decompose graphs or cluster data. The algorithm recovers the solution obtained from spectral clustering without need for expensive eigenvalue/ eigenvector computations. We demonstrate that by solving the wave equation on the graph, every node can assign itself to a cluster by performing a local fast Fourier transform. We prove the equivalence of our algorithm to spectral clustering, derive convergence rates and demonstrate it on examples.
A High-Order CFS Algorithm for Clustering Big Data
Fanyu Bu; Zhikui Chen; Peng Li; Tong Tang; Ying Zhang
2016-01-01
With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS) to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learn...
Improvement and Parallelism of k-Means Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
TIAN Jinlan; ZHU Lin; ZHANG Suqin; LIU Lu
2005-01-01
The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms.
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
Directory of Open Access Journals (Sweden)
Morris John H
2011-11-01
Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster
Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
Directory of Open Access Journals (Sweden)
Jinhua Li
2016-01-01
Full Text Available Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES
Directory of Open Access Journals (Sweden)
D.Kerana Hanirex
2011-03-01
Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Directory of Open Access Journals (Sweden)
Xin Chen
2014-01-01
Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
DYNAMIC REQUEST DISPATCHING ALGORITHM FOR WEB SERVER CLUSTER
Institute of Scientific and Technical Information of China (English)
Yang Zhenjiang; Zhang Deyun; Sun Qindong; Sun Qing
2006-01-01
Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.
PROPOSED A HETEROGENEOUS CLUSTERING ALGORITHM TO IMPROVE QOS IN WSN
Directory of Open Access Journals (Sweden)
Mehran Mokhtari
2016-07-01
Full Text Available In this article it has presented leach extended hierarchical 3-level clustered heterogeneous and dynamics algorithm. On suggested protocol (LEH3LA with planning of selected auction cluster head, and alternative cluster head node, problem of delay on processing, processing of selecting members, decrease of expenses, and energy consumption, decrease of sending message, and receiving messages inside the clusters, selecting of cluster heads in large sensor networks were solved. This algorithm uses hierarchical heterogeneous network (3-levels, collective intelligence, and intra-cluster interaction for communications. Also it will solve the problems of sending data in Multi-BS mobile networks, expanding inter-cluster networks, overlap cluster, genesis orphan nodes, boundary change dynamically clusters, using backbone networks, cloud sensor. Using sleep/wake scheduling algorithm or TDMA-schedule alternative cluster head node provides redundancy, and fault tolerance. Local processing in cluster head nodes, and alternative cluster head, intra-cluster and inter-cluster communications such as Multi-HOP cause increase on processing speed, and sending data intra-cluster and inter-cluster. Decrease of overhead network, and increase the load balancing among cluster heads. Using encapsulation of data method, by cluster head nodes, energy consumption decrease during sending data. Also by improving quality of service (QoS in CBRP, LEACH, 802.15.4, decrease of energy consumption in sensors, cluster heads and alternative cluster head nodes, cause increase on lift time of sensor networks
Genetic Algorithms for Auto-Clustering in KDD
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
In solving the clustering problem in the context of knowledge discovery in databases (KDD), the traditional methods, for example, the K-means algorithm and its variants, usually require the users to provide the number of clusters in advance based on the pro-information. Unfortunately, the number of clusters in general is unknown to the users who are usually short of pro-information. Therefore, the clustering calculation becomes a tedious trial-and-error work, and the result is often not global optimal especially when the number of clusters is large. In this paper, a new dynamic clustering method based on genetic algorithms (GA) is proposed and applied for auto-clustering of data entities in large databases. The algorithm can automatically cluster the data according to their similarities and find the exact number of clusters. Experiment results indicate that the method is of global optimization by dynamically clustering logic.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
A Novel Clustering Algorithm Inspired by Membrane Computing
Directory of Open Access Journals (Sweden)
Hong Peng
2015-01-01
Full Text Available P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.
FCM Clustering Algorithms for Segmentation of Brain MR Images
Directory of Open Access Journals (Sweden)
Yogita K. Dubey
2016-01-01
Full Text Available The study of brain disorders requires accurate tissue segmentation of magnetic resonance (MR brain images which is very important for detecting tumors, edema, and necrotic tissues. Segmentation of brain images, especially into three main tissue types: Cerebrospinal Fluid (CSF, Gray Matter (GM, and White Matter (WM, has important role in computer aided neurosurgery and diagnosis. Brain images mostly contain noise, intensity inhomogeneity, and weak boundaries. Therefore, accurate segmentation of brain images is still a challenging area of research. This paper presents a review of fuzzy c-means (FCM clustering algorithms for the segmentation of brain MR images. The review covers the detailed analysis of FCM based algorithms with intensity inhomogeneity correction and noise robustness. Different methods for the modification of standard fuzzy objective function with updating of membership and cluster centroid are also discussed.
Directory of Open Access Journals (Sweden)
G. Abel Thangaraja
2014-11-01
Full Text Available The need of Data mining is because of the explosive growth of data from terabytes to petabytes. Data mining preprocess aims to produce the quality mining result in descriptive and predictive analysis. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. A straightforward way to combine structural and attribute similarities is to use a weighted distance function. Clustering results are arrived based on attribute similarities. The clusters balance the attribute and structural similarities. The existing Structural and Attribute cluster algorithm is analyzed and a new algorithm is proposed. Both the algorithms are compared and results are analyzed. It is found that the modified algorithm gives better quality clusters.
An energy efficient clustering routing algorithm for wireless sensor networks
Institute of Scientific and Technical Information of China (English)
LI Li; DONG Shu-song; WEN Xiang-ming
2006-01-01
This article proposes an energy efficient clustering routing (EECR) algorithm for wireless sensor network. The algorithm can divide a sensor network into a few clusters and select a cluster head base on weight value that leads to more uniform energy dissipation evenly among all sensor nodes.Simulations and results show that the algorithm can save overall energy consumption and extend the lifetime of the wireless sensor network.
Introduction to Clustering Algorithms and Applications
Yang, Sibei; Tao, Liangde; Gong, Bingchen
2014-01-01
Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different fields are actively working on the clustering problem. This paper provides an overview of the different representative clustering methods. In addition, application of clustering in different field is briefly introduced.
PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
ZHOU HaoFeng(周皓峰); YUAN QingQing(袁晴晴); CHENG ZunPing(程尊平); SHI BaiLe(施伯乐)
2003-01-01
Cluster analysis is a process to classify data in a specified data set. In this field,much attention is paid to high-efficiency clustering algorithms. In this paper, the features in thecurrent partition-based and hierarchy-based algorithms are reviewed, and a new hierarchy-basedalgorithm PHC is proposed by combining advantages of both algorithms, which uses the cohesionand the closeness to amalgamate the clusters. Compared with similar algorithms, the performanceof PHC is improved, and the quality of clustering is guaranteed. And both the features were provedby the theoretic and experimental analyses in the paper.
Gravitation field algorithm and its application in gene cluster
Directory of Open Access Journals (Sweden)
Zheng Ming
2010-09-01
Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.
An Incremental Algorithm of Text Clustering Based on Semantic Sequences
Institute of Scientific and Technical Information of China (English)
FENG Zhonghui; SHEN Junyi; BAO Junpeng
2006-01-01
This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.
URL Mining Using Agglomerative Clustering Algorithm
Directory of Open Access Journals (Sweden)
Chinmay R. Deshmukh
2015-02-01
Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.
A fingerprint identification algorithm by clustering similarity
Institute of Scientific and Technical Information of China (English)
TIAN Jie; HE Yuliang; CHEN Hong; YANG Xin
2005-01-01
This paper introduces a fingerprint identification algorithm by clustering similarity with the view to overcome the dilemmas encountered in fingerprint identification.To decrease multi-spectrum noises in a fingerprint, we first use a dyadic scale space (DSS) method for image enhancement. The second step describes the relative features among minutiae by building a minutia-simplex which contains a pair of minutiae and their local associated ridge information, with its transformation-variant and invariant relative features applied for comprehensive similarity measurement and for parameter estimation respectively. The clustering method is employed to estimate the transformation space.Finally, multi-resolution technique is used to find an optimal transformation model for getting the maximal mutual information between the input and the template features. The experimental results including the performance evaluation by the 2nd International Verification Competition in 2002 (FVC2002), over the four fingerprint databases of FVC2002 indicate that our method is promising in an automatic fingerprint identification system (AFIS).
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm
Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian
2017-03-01
DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Local Community Detection Algorithm Based on Minimal Cluster
Directory of Open Access Journals (Sweden)
Yong Zhou
2016-01-01
Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.
A multi-sequential number-theoretic optimization algorithm using clustering methods
Institute of Scientific and Technical Information of China (English)
XU Qing-song; LIANG Yi-zeng; HOU Zhen-ting
2005-01-01
A multi-sequential number-theoretic optimization method based on clustering was developed and applied to the optimization of functions with many local extrema. Details of the procedure to generate the clusters and the sequential schedules were given. The algorithm was assessed by comparing its performance with generalized simulated annealing algorithm in a difficult instructive example and a D-optimum experimental design problem. It is shown the presented algorithm to be more effective and reliable based on the two examples.
Comparison of cluster expansion fitting algorithms for interactions at surfaces
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
Clustering Algorithms: Their Application to Gene Expression Data
Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel
2016-01-01
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Elementary functions algorithms and implementation
Muller, Jean-Michel
2016-01-01
This textbook presents the concepts and tools necessary to understand, build, and implement algorithms for computing elementary functions (e.g., logarithms, exponentials, and the trigonometric functions). Both hardware- and software-oriented algorithms are included, along with issues related to accurate floating-point implementation. This third edition has been updated and expanded to incorporate the most recent advances in the field, new elementary function algorithms, and function software. After a preliminary chapter that briefly introduces some fundamental concepts of computer arithmetic, such as floating-point arithmetic and redundant number systems, the text is divided into three main parts. Part I considers the computation of elementary functions using algorithms based on polynomial or rational approximations and using table-based methods; the final chapter in this section deals with basic principles of multiple-precision arithmetic. Part II is devoted to a presentation of “shift-and-add” algorithm...
Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous Cluster
Directory of Open Access Journals (Sweden)
Mayuri A Mehta
2015-12-01
Full Text Available A scheduling algorithm is required to efficiently manage cluster resources in a Hadoop cluster, thereby to increase resource utilization and to reduce response time. The job aware scheduling algorithm schedules non-local map tasks of jobs based on job execution time, earliest deadline first or workload of the job. In this paper, we present the performance evaluation of the job aware scheduling algorithm using MapReduce WordCount benchmark. The experimental results are compared with matchmaking scheduling algorithm. The results show that the job aware scheduling algorithm reduces average waiting time and memory wastage considerably as compared to matchmaking algorithm.
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2008-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2006-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
A Flocking Based algorithm for Document Clustering Analysis
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL
2006-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
Android Malware Classification Using K-Means Clustering Algorithm
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis
Directory of Open Access Journals (Sweden)
S. Muthurajkumar
2014-05-01
Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.
A new hybrid imperialist competitive algorithm on data clustering
Indian Academy of Sciences (India)
Taher Niknam; Elahe Taherian Fard; Shervin Ehrampoosh; Alireza Rousta
2011-06-01
Clustering is a process for partitioning datasets. This technique is very useful for optimum solution. -means is one of the simplest and the most famous methods that is based on square error criterion. This algorithm depends on initial states and converges to local optima. Some recent researches show that -means algorithm has been successfully applied to combinatorial optimization problems for clustering. In this paper, we purpose a novel algorithm that is based on combining two algorithms of clustering; -means and Modify Imperialist Competitive Algorithm. It is named hybrid K-MICA. In addition, we use a method called modiﬁed expectation maximization (EM) to determine number of clusters. The experimented results show that the new method carries out better results than the ACO, PSO, Simulated Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS), Honey Bee Mating Optimization (HBMO) and -means.
Extension of K-Modes Algorithm for Generating Clusters Automatically
Directory of Open Access Journals (Sweden)
Anupama Chadha
2016-03-01
Full Text Available —K-Modes is an eminent algorithm for clustering data set with categorical attributes. This algorithm is famous for its simplicity and speed. The KModes is an extension of the K-Means algorithm for categorical data. Since K-Modes is used for categorical data so ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Modes’ of clusters are used instead of ‘Means’. However, one major limitation of this algorithm is dependency on prior input of number of clusters K, and sometimes it becomes practically impossible to correctly estimate the optimum number of clusters in advance. In this paper we have proposed an algorithm which will overcome this limitation while maintaining the simplicity of K-Modes algorithm
Resource Allocation in Public Cluster with Extended Optimization Algorithm
Akbar, Z.; Handoko, L. T.
2007-01-01
We introduce an optimization algorithm for resource allocation in the LIPI Public Cluster to optimize its usage according to incoming requests from users. The tool is an extended and modified genetic algorithm developed to match specific natures of public cluster. We present a detail analysis of optimization, and compare the results with the exact calculation. We show that it would be very useful and could realize an automatic decision making system for public clusters.
An ACO Algorithm for Effective Cluster Head Selection
Sampath, Amritha; Thampi, Sabu M; 10.4304/jait.2.1.50-56
2011-01-01
This paper presents an effective algorithm for selecting cluster heads in mobile ad hoc networks using ant colony optimization. A cluster in an ad hoc network consists of a cluster head and cluster members which are at one hop away from the cluster head. The cluster head allocates the resources to its cluster members. Clustering in MANET is done to reduce the communication overhead and thereby increase the network performance. A MANET can have many clusters in it. This paper presents an algorithm which is a combination of the four main clustering schemes- the ID based clustering, connectivity based, probability based and the weighted approach. An Ant colony optimization based approach is used to minimize the number of clusters in MANET. This can also be considered as a minimum dominating set problem in graph theory. The algorithm considers various parameters like the number of nodes, the transmission range etc. Experimental results show that the proposed algorithm is an effective methodology for finding out t...
Squeezer: An Efficient Algorithm for Clustering Categorical Data
Institute of Scientific and Technical Information of China (English)
何增有; 徐晓飞; 邓胜春
2002-01-01
This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deservegood scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning tto an existing cluster (initially none), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to its characteristics, the proposed algorithm isextremely suitable for clustering data streams, where given a sequence of points, the objective isto maintain consistently good clustering of the sequence so far, using a small amount of memoryand time. Outliers can also be handled efficiently and directly in Squeezer. Experimental resultson real-life and synthetic datasets verify the superiority of Squeezer.
Using Hyper Clustering Algorithms in Mobile Network Planning
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: As a large amount of data stored in spatial databases, people may like to find groups of data which share similar features. Thus cluster analysis becomes an important area of research in data mining. Applications of clustering analysis have been utilized in many fields, such as when we search to construct a cluster served by base station in mobile network. Deciding upon the optimum placement for the base stations to achieve best services while reducing the cost is a complex task requiring vast computational resource. Approach: This study addresses antenna placement problem or the cell planning problem, involves locating and configuring infrastructure for mobile networks by modified the original density-based Spatial Clustering of Applications with Noise algorithm. The Cluster Partitioning around Medoids original algorithm has been modified and a new algorithm has been proposed by the authors in a recent work. In this study, the density-based Spatial Clustering of Applications with Noise original algorithm has been modified and combined with old algorithm to produce the hybrid algorithm Clustering Density Base and Clustering with Weighted Node-Partitioning around Medoids algorithm to solve the problems in Mobile Network Planning. Results: Implementation of this algorithm to a real case study is presented. Results demonstrate that the proposed algorithm has minimum run time minimum cost and high grade of service. Conclusion: The proposed hyper algorithm has the advantage of quick divide the area into clusters where the density base algorithm has a limit iteration and the advantage of accuracy (no sampling method is used and highly grade of service due to the moving of the location of the base stations (medoid toward the heavy loaded (weighted nodes.
Co-clustering models, algorithms and applications
Govaert, Gérard
2013-01-01
Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture
Sonar Image Detection Algorithm Based on Two-Phase Manifold Partner Clustering
Institute of Scientific and Technical Information of China (English)
Xingmei Wang; Zhipeng Liu; Jianchuang Sun; Shu Liu
2015-01-01
According to the characteristics of sonar image data with manifold feature, the sonar image detection method based on two⁃phase manifold partner clustering algorithm is proposed. Firstly, K⁃means block clustering based on euclidean distance is proposed to reduce the data set. Mean value, standard deviation, and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K⁃means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K⁃means clustering algorithm based on manifold distance, line segment length on the manifold is analyzed, and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance, new all⁃source shortest path as the pretreatment of efficient algorithm is proposed. Based on this, the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.
Comparing the biological coherence of network clusters identified by different detection algorithms
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Protein-protein interaction networks serve to carry out basic molecular activity in the cell. Detecting the modular structures from the protein-protein interaction network is important for understanding the organization, function and dynamics of a biological system. In order to identify functional neighborhoods based on network topology, many network cluster identification algorithms have been developed. However, each algorithm might dissect a network from a different aspect and may provide different insight on the network partition. In order to objectively evaluate the performance of four commonly used cluster detection algorithms: molecular complex detection (MCODE), NetworkBlast, shortest-distance clustering (SDC) and Girvan-Newman (G-N) algorithm, we compared the biological coherence of the network clusters found by these algorithms through a uniform evaluation framework. Each algorithm was utilized to find network clusters in two different protein-protein interaction networks with various parameters. Comparison of the resulting network clusters indicates that clusters found by MCODE and SDC are of higher biological coherence than those by NetworkBlast and G-N algorithm.
Constructing Product Ontologies with an Improved Conceptual Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
曹大军; 徐良贤
2002-01-01
In a distributed eMarketplace, recommended product ontologies are required for trading between buyers and sellers. Conceptual clustering can be employed to build dynamic recommended product ontologies. Traditional methods of conceptual clustering (e. g. COBWEB or Cluster/2) do not take heterogeneous attributes of a concept into account.Moreover, the result of these methods is clusters other than recommended concepts. A center recommendation clustering algorithm is provided. According to the values of heterogeneous attributes, recommended product names can be selected at the clusters, which are produced by this algorithm. This algorithm can also create the hierarchical relations between product names. The definitions of product names given by all participants are collected in a distributed eMarketplace.Recommended product ontologies are built. These ontologies include relations and definitions of product names, which come from different participants in the distributed eMarketplace. Finally a case is given to illustrate this method. The result shows that this method is feasible.
An Efficient Data Aggregation Algorithm for Cluster-based Sensor Network
Directory of Open Access Journals (Sweden)
Mohammad Mostafizur Rahman Mozumdar
2009-09-01
Full Text Available Data aggregation in wireless sensor networks eliminates redundancy to improve bandwidth utilization and energyefficiency of sensor nodes. One node, called the cluster leader, collects data from surrounding nodes and then sends the summarized information to upstream nodes. In this paper, we propose an algorithm to select a cluster leader that will perform data aggregation in a partially connected sensor network. The algorithm reduces the traffic flow inside the network by adaptively selecting the shortest route for packet routing to the cluster leader. We also describe a simulation framework for functional analysis of WSN applications taking our proposed algorithm as an example.
Text clustering based on fusion of ant colony and genetic algorithms
Institute of Scientific and Technical Information of China (English)
Yun ZHANG; Boqin FENG; Shouqiang MA; Lianmeng LIU
2009-01-01
Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is proposed.The four parameters that influence the performance of the ant colony algorithm are encoded as chromosomes,thereby the fitness function,selection,crossover and mutation operator are designed to find the combination of optimal parameters through a number of iteration,and then it is applied to text clustering.The simulation.results show that compared with the classical k-means clustering and the basic ant colony clustering algorithm,the proposed algorithm has better performance and the value of F-Measure is enhanced by 5.69%,48.60% and 69.60%,respectively,in 3 test datasets.Therefore,it is more suitable for processing a larger dataset.
Cosine-Based Clustering Algorithm Approach
Directory of Open Access Journals (Sweden)
Mohammed A. H. Lubbad
2012-02-01
Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.
Wireless Meter Reading Based Energy-Balanced Steady Clustering Routing Algorithm for Sensor Networks
Directory of Open Access Journals (Sweden)
TANG, Z.
2011-05-01
Full Text Available According to the characteristics of wireless meter reading system, an energy-balanced and energy-efficient steady clustering routing algorithm (EBSC, Energy-Balanced Steady Clustering is proposed. In the clustering mechanism, the current cluster head nodes determine cluster head nodes for next round according to the residual energy of the cluster members. In the next round, each non-cluster head node decides the cluster to which it will belong according to energy-distance function. The cluster head nodes send data to base station by the communication model of single hop and multi-hop that is decided according to the criterion of minimum energy consumption. In EBSC algorithm, the number of cluster head nodes generated in each round is very steady, and EBSC combines the advantage both distributed and centralized clustering algorithm. Experimental results show that the proposed routing algorithm not only efficiently uses limited energy of network nodes, but also well balances energy consumption of all nodes, and significantly prolongs network lifetime.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
jClustering, an open framework for the development of 4D clustering algorithms.
Directory of Open Access Journals (Sweden)
José María Mateos-Pérez
Full Text Available We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License to allow modification if necessary.
Meaningful Clustered Forest: an Automatic and Robust Clustering Algorithm
Tepper, Mariano; Almansa, Andrés
2011-01-01
We propose a new clustering method that can be regarded as a numerical method to compute the proximity gestalt. The method analyzes edge length statistics in the MST of the dataset and provides an a contrario cluster detection criterion. The approach is fully parametric on the chosen distance and can detect arbitrarily shaped clusters. The method is also automatic, in the sense that only a single parameter is left to the user. This parameter has an intuitive interpretation as it controls the expected number of false detections. We show that the iterative application of our method can (1) provide robustness to noise and (2) solve a masking phenomenon in which a highly populated and salient cluster dominates the scene and inhibits the detection of less-populated, but still salient, clusters.
K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data
Directory of Open Access Journals (Sweden)
Cheng Lu
2015-01-01
Full Text Available The Affinity Propagation (AP algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based on K-nearest neighbor intervals (KNNI for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Directory of Open Access Journals (Sweden)
Zakir Hussain Ahmed
2014-01-01
Full Text Available The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
Institute of Scientific and Technical Information of China (English)
CHEN Jian-bin; DONG Xiang-jun; SONG Han-tao
2004-01-01
To construct a high efficient text clustering algorithm, the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed.The model is applied to text clustering.The performance of clustering algorithm has to be improved with the refinement algorithm application.The experiment result demonstrated that the multilevel graph text clustering algorithm is available.
A Scalable Clustering Algorithm in Dense Mobile Sensor Networks
Directory of Open Access Journals (Sweden)
Jianbo Li
2011-03-01
Full Text Available Clustering offers a kind of hierarchical organization to provide scalability and basic performance guarantee by partitioning the network into disjoint groups of nodes. In this paper a scalable and energy efficient clustering algorithm is proposed under dense mobile sensor networks scenario. In the initial cluster formation phase, our proposed scheme features a simple execution process with polynomial time complexity, and eliminates the “frozen time” requirement by introducing some GPS-capable mobile nodes to act as cluster heads. In the following cluster maintenance stage, the maintenance of clusters is asynchronously and event driven so as to thoroughly eliminate the “ripple effect” brought by node mobility. As a result local changes in a cluster need not be seen and updated by the entire network, thus bringing greatly reduced communication overheads and being well suitable for the high mobility environment. Extensive simulations have been conducted and the simulation results reveal that our proposed algorithm successfully achieves its target at incurring much less clustering overheads as well as maintaining much more stable cluster structure, as compared to HCC(High Connectivity Clustering algorithm
Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm
Dong Qin
2014-01-01
Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstab...
The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering
Directory of Open Access Journals (Sweden)
Khalid Jaber
2009-01-01
Full Text Available Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI library and POSIX threads (PThread to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.
A New Method for Medical Image Clustering Using Genetic Algorithm
Directory of Open Access Journals (Sweden)
Akbar Shahrzad Khashandarag
2013-01-01
Full Text Available Segmentation is applied in medical images when the brightness of the images becomes weaker so that making different in recognizing the tissues borders. Thus, the exact segmentation of medical images is an essential process in recognizing and curing an illness. Thus, it is obvious that the purpose of clustering in medical images is the recognition of damaged areas in tissues. Different techniques have been introduced for clustering in different fields such as engineering, medicine, data mining and so on. However, there is no standard technique of clustering to present ideal results for all of the imaging applications. In this paper, a new method combining genetic algorithm and k-means algorithm is presented for clustering medical images. In this combined technique, variable string length genetic algorithm (VGA is used for the determination of the optimal cluster centers. The proposed algorithm has been compared with the k-means clustering algorithm. The advantage of the proposed method is the accuracy in selecting the optimal cluster centers compared with the above mentioned technique.
Centronit: Initial Centroid Designation Algorithm for K-Means Clustering
Directory of Open Access Journals (Sweden)
Ali Ridho Barakbah
2014-06-01
Full Text Available Clustering performance of the K-means highly depends on the correctness of initial centroids. Usually initial centroids for the K- means clustering are determined randomly so that the determined initial centers may cause to reach the nearest local minima, not the global optimum. In this paper, we propose an algorithm, called as Centronit, for designation of initial centroidoptimization of K-means clustering. The proposed algorithm is based on the calculation of the average distance of the nearest data inside region of the minimum distance. The initial centroids can be designated by the lowest average distance of each data. The minimum distance is set by calculating the average distance between the data. This method is also robust from outliers of data. The experimental results show effectiveness of the proposed method to improve the clustering results with the K-means clustering. Keywords: K-means clustering, initial centroids, Kmeansoptimization.
New clustering algorithm for interconnection of MANET and internet
Institute of Scientific and Technical Information of China (English)
万象; 姚尹雄; 王豪行
2004-01-01
This paper presents core-agent based clustering (CBC) algorithm, a novel heuristic clustering scheme for interconnection of MANET and Internet using power, movement probability and hop length as constraints. CBC includes two phases as cluster initialization and cluster maintenance. In phase one, the selection of clusterheads obeys the first two constraints, whereas the father node of each clustering node is chosen according to above three ones. Phase two concerns the case of node insertion or removal. Easy access and little alteration of conventional mobile IP are some characters of this algorithm. Simulation results demonstrate that CBC has many advantages as less average hop length, good robustness and less overheads, and the clustered network architecture behaves stably when topology changes.
Ternary alloy material prediction using genetic algorithm and cluster expansion
Energy Technology Data Exchange (ETDEWEB)
Chen, Chong [Iowa State Univ., Ames, IA (United States)
2015-12-01
This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe_{3}VSi_{2} is a new stable phase and it can be very inspiring to the future experiments.
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
An Extended Clustering Algorithm for Statistical Language Models
Ueberla, J P
1994-01-01
Statistical language models frequently suffer from a lack of training data. This problem can be alleviated by clustering, because it reduces the number of free parameters that need to be trained. However, clustered models have the following drawback: if there is ``enough'' data to train an unclustered model, then the clustered variant may perform worse. On currently used language modeling corpora, e.g. the Wall Street Journal corpus, how do the performances of a clustered and an unclustered model compare? While trying to address this question, we develop the following two ideas. First, to get a clustering algorithm with potentially high performance, an existing algorithm is extended to deal with higher order N-grams. Second, to make it possible to cluster large amounts of training data more efficiently, a heuristic to speed up the algorithm is presented. The resulting clustering algorithm can be used to cluster trigrams on the Wall Street Journal corpus and the language models it produces can compete with exi...
Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm
Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane
2017-08-01
Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.
Karayiannis, Nicolaos B; Randolph-Gips, Mary M
2005-03-01
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on a weighted norm to measure the distance between the feature vectors and their prototypes. The development of LVQ and clustering algorithms is based on the minimization of a reformulation function under the constraint that the generalized mean of the norm weights be constant. According to the proposed formulation, the norm weights can be computed from the data in an iterative fashion together with the prototypes. An error analysis provides some guidelines for selecting the parameter involved in the definition of the generalized mean in terms of the feature variances. The algorithms produced from this formulation are easy to implement and they are almost as fast as clustering algorithms relying on the Euclidean norm. An experimental evaluation on four data sets indicates that the proposed algorithms outperform consistently clustering algorithms relying on the Euclidean norm and they are strong competitors to non-Euclidean algorithms which are computationally more demanding.
Scheduling algorithm of dual-armed cluster tools with residency time and reentrant constraints
Institute of Scientific and Technical Information of China (English)
周炳海; 高忠顺; 陈佳
2014-01-01
To solve the scheduling problem of dual-armed cluster tools for wafer fabrications with residency time and reentrant constraints, a heuristic scheduling algorithm was developed. Firstly, on the basis of formulating scheduling problems domain of dual-armed cluster tools, a non-integer programming model was set up with a minimizing objective function of the makespan. Combining characteristics of residency time and reentrant constraints, a scheduling algorithm of searching the optimal operation path of dual-armed transport module was presented under many kinds of robotic scheduling paths for dual-armed cluster tools. Finally, the experiments were designed to evaluate the proposed algorithm. The results show that the proposed algorithm is feasible and efficient for obtaining an optimal scheduling solution of dual-armed cluster tools with residency time and reentrant constraints.
Critical dynamics of cluster algorithms in the dilute Ising model
Hennecke, M.; Heyken, U.
1993-08-01
Autocorrelation times for thermodynamic quantities at T C are calculated from Monte Carlo simulations of the site-diluted simple cubic Ising model, using the Swendsen-Wang and Wolff cluster algorithms. Our results show that for these algorithms the autocorrelation times decrease when reducing the concentration of magnetic sites from 100% down to 40%. This is of crucial importance when estimating static properties of the model, since the variances of these estimators increase with autocorrelation time. The dynamical critical exponents are calculated for both algorithms, observing pronounced finite-size effects in the energy autocorrelation data for the algorithm of Wolff. We conclude that, when applied to the dilute Ising model, cluster algorithms become even more effective than local algorithms, for which increasing autocorrelation times are expected.
Vinitsky, Sergue; Chuluunbaatar, Ochbadrakh; Rostovtsev, Vitaly; Hai, Luong Le; Derbov, Vladimir; Krassovitskiy, Pavel
2013-01-01
A model for quantum tunnelling of a cluster comprising A identical particles, coupled by oscillator-type potential, through short-range repulsive potential barriers is introduced for the first time in the new symmetrized-coordinate representation and studied within the s-wave approximation. The symbolic-numerical algorithms for calculating the effective potentials of the close-coupling equations in terms of the cluster wave functions and the energy of the barrier quasistationary states are formulated and implemented using the Maple computer algebra system. The effect of quantum transparency, manifesting itself in nonmonotonic resonance-type dependence of the transmission coefficient upon the energy of the particles, the number of the particles A=2,3,4, and their symmetry type, is analyzed. It is shown that the resonance behavior of the total transmission coefficient is due to the existence of barrier quasistationary states imbedded in the continuum.
Segmentation of Medical Image using Clustering and Watershed Algorithms
M. C.J. Christ; R.M.S Parvathi
2011-01-01
Problem statement: Segmentation plays an important role in medical imaging. Segmentation of an image is the division or separation of the image into dissimilar regions of similar attribute. In this study we proposed a methodology that integrates clustering algorithm and marker controlled watershed segmentation algorithm for medical image segmentation. The use of the conservative watershed algorithm for medical image analysis is pervasive because of its advantages, such as always being able to...
Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2006-01-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.
Efficient cluster algorithm for CP(N-1) models
Beard, B. B.; Pepe, M.; Riederer, S.; Wiese, U.-J.
2006-11-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z=0.
Semi-Supervised Clustering Fingerprint Positioning Algorithm Based on Distance Constraints
Institute of Scientific and Technical Information of China (English)
Ying Xia; Zhongzhao Zhang; Lin Ma; Yao Wang
2015-01-01
With the rapid development of WLAN ( Wireless Local Area Network ) technology, an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation. In this paper, it proposes a novel fingerprint positioning algorithm known as semi⁃supervised affinity propagation clustering based on distance function constraints. We show that by employing affinity propagation techniques, it is able to use a fractional labeled data to adjust similarity matrix of signal space to cluster reference points with high accuracy. The semi⁃supervised APC uses a combination of machine learning, clustering analysis and fingerprinting algorithm. By collecting data and testing our algorithm in a realistic indoor WLAN environment, the experimental results indicate that the proposed algorithm can improve positioning accuracy while reduce the online localization computation, as compared with the widely used K nearest neighbor and maximum likelihood estimation algorithms.
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
ZHANG; Zhihua
2001-01-01
［1］Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum, 1981.［2］Krishnapuram, R., Keller, J., A possibilistic approach to clustering, IEEE Trans. on Fuzzy Systems, 1993, 1(2): 98.［3］Yair, E., Zeger, K., Gersho, A., Competitive learning and soft competition for vector quantizer design, IEEE Trans on Signal Processing, 1992, 40(2): 294.［4］Pal, N. R., Bezdek, J. C., Tsao, E. C. K., Generalized clustering networks and Kohonen's self-organizing scheme, IEEE Trans on Neural Networks, 1993, 4(4): 549.［5］Karayiannis, N. B., Bezdek, J. C., Pal, N. R. et al., Repair to GLVQ: a new family of competitive learning schemes, IEEE Trans on Neural Networks, 1996, 7(5): 1062.［6］Karayiannis, N. B., Pai, P. I., Fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1996, 7(5): 1196.［7］Karayiannis, N. B., A methodology for constructing fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1997, 8(3): 505.［8］Karayiannis, N. B., Bezdek, J. C., An integrated approach to fuzzy learning vector quantization and fuzzy C-Means clustering, IEEE Trans. on Fuzzy Systems, 1997, 5(4): 622.［9］Li Xing-si, An efficient approach to nonlinear minimax problems, Chinese Science Bulletin? 1992, 37(10): 802.［10］Li Xing-si, An efficient approach to a class of non-smooth optimization problems, Science in China, Series A,1994, 37(3): 323.［11］. Zangwill, W., Non-linear Programming: A Unified Approach, Englewood Cliffs: Prentice-Hall, 1969.［12］. Fletcher, R., Practical Methods of Optimization,2nd ed., New York: John Wiley & Sons, 1987.［13］. Zhang Zhihua, Zheng Nanning, Wang Tianshu, Behavioral analysis and improving of generalized LVQ neural network, Acta Automatica Sinica, 1999, 25(5): 582.［14］. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., Optimization by simulated annealing, Science, 1983, 220(3): 671.［15］. Ross, K., Deterministic annealing for
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
A dynamic fuzzy clustering method based on genetic algorithm
Institute of Scientific and Technical Information of China (English)
ZHENG Yan; ZHOU Chunguang; LIANG Yanchun; GUO Dongwei
2003-01-01
A dynamic fuzzy clustering method is presented based on the genetic algorithm. By calculating the fuzzy dissimilarity between samples the essential associations among samples are modeled factually. The fuzzy dissimilarity between two samples is mapped into their Euclidean distance, that is, the high dimensional samples are mapped into the two-dimensional plane. The mapping is optimized globally by the genetic algorithm, which adjusts the coordinates of each sample, and thus the Euclidean distance, to approximate to the fuzzy dissimilarity between samples gradually. A key advantage of the proposed method is that the clustering is independent of the space distribution of input samples, which improves the flexibility and visualization. This method possesses characteristics of a faster convergence rate and more exact clustering than some typical clustering algorithms. Simulated experiments show the feasibility and availability of the proposed method.
SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
Directory of Open Access Journals (Sweden)
S. Anitha Elavarasi
2014-01-01
Full Text Available Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification or unsupervised learning (e.g. Clustering Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer, two different similarity measure (DISC and Overlap and DILCA applied for hierarchy and partition algorithm are evaluated.
A Geometric Clustering Algorithm with Applications to Structural Data
Xu, Shutan; Zou, Shuxue
2015-01-01
Abstract An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Research on retailer data clustering algorithm based on Spark
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Big Data Clustering Using Genetic Algorithm On Hadoop Mapreduce
Directory of Open Access Journals (Sweden)
Nivranshu Hans
2015-04-01
Full Text Available Abstract Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Mapreduce is a popular programming model which enables parallel processing in a distributed environment. But most of the clustering algorithms are not naturally parallelizable for instance Genetic Algorithms. This is so due to the sequential nature of Genetic Algorithms. This paper introduces a technique to parallelize GA based clustering by extending hadoop mapreduce. An analysis of proposed approach to evaluate performance gains with respect to a sequential algorithm is presented. The analysis is based on a real life large data set.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm
van Breukelen, Caroline
2009-01-01
We present a new cluster detection algorithm designed for finding high-redshift clusters using optical/infrared imaging data. The algorithm has two main characteristics. First, it utilises each galaxy's full redshift probability function, instead of an estimate of the photometric redshift based on the peak of the probability function and an associated Gaussian error. Second, it identifies cluster candidates through cross-checking the results of two substantially different selection techniques (the name 2TecX representing the cross-check of the two techniques). These are adaptations of the Voronoi Tesselations and Friends-Of-Friends methods. Monte-Carlo simulations of mock catalogues show that cross-checking the cluster candidates found by the two techniques significantly reduces the detection of spurious sources. Furthermore, we examine the selection effects and relative strengths and weaknesses of either method. The simulations also allow us to fine-tune the algorithm's parameters, and define completeness an...
An improved algorithm for clustering gene expression data.
Bandyopadhyay, Sanghamitra; Mukhopadhyay, Anirban; Maulik, Ujjwal
2007-11-01
Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Improved insensitive to input parameters trajectory clustering algorithm
Institute of Scientific and Technical Information of China (English)
Jiashun Chen; Dechang Pi
2013-01-01
The existing trajectory clustering (TRACLUS) is sensi-tive to the input parameters ε and MinLns. The parameter value is changed a little, but cluster results are entirely different. Aiming at this vulnerability, a shielding parameters sensitivity trajectory cluster (SPSTC) algorithm is proposed which is insensitive to the input parameters. Firstly, some definitions about the core distance and reachable distance of line segment are presented, and then the algorithm generates cluster sorting according to the core dis-tance and reachable distance. Secondly, the reachable plots of line segment sets are constructed according to the cluster sor-ting and reachable distance. Thirdly, a parameterized sequence is extracted according to the reachable plot, and then the final trajec-tory cluster based on the parameterized sequence is acquired. The parameterized sequence represents the inner cluster structure of trajectory data. Experiments on real data sets and test data sets show that the SPSTC algorithm effectively reduces the sensitivity to the input parameters, meanwhile it can obtain the better quality of the trajectory cluster.
Multilayer Traffic Network Optimized by Multiobjective Genetic Clustering Algorithm
Wen, Feng; Gen, Mitsuo; Yu, Xinjie
This paper introduces a multilayer traffic network model and traffic network clustering method for solving the route selection problem (RSP) in car navigation system (CNS). The purpose of the proposed method is to reduce the computation time of route selection substantially with acceptable loss of accuracy by preprocessing the large size traffic network into new network form. The proposed approach further preprocesses the traffic network than the traditional hierarchical network method by clustering method. The traffic network clustering considers two criteria. We specify a genetic clustering algorithm for traffic network clustering and use NSGA-II for calculating the multiple objective Pareto optimal set. The proposed method can overcome the size limitations when solving route selection in CNS. Solutions provided by the proposed algorithm are compared with the optimal solutions to analyze and quantify the loss of accuracy.
Uncertainties in the cluster-cluster correlation function
Energy Technology Data Exchange (ETDEWEB)
Ling, E.N.; Barrow, J.D.; Frenk, C.S.
1986-12-01
The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation functions is also calculated for a sample of 1664 Abell clusters. The standard errors for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors.
Morphology of Open Clusters NGC 1857 and Czernik 20 using Clustering Algorithms
Bhattacharya, Souradeep; Pandaokar, Samay; Singh, Parikshit Kishor
2016-01-01
The morphology and cluster membership of the Galactic open clusters - Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Emmons, Scott; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
Sampling Within k-Means Algorithm to Cluster Large Datasets
Energy Technology Data Exchange (ETDEWEB)
Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
GDCluster: A General Decentralized Clustering Algorithm
Mashayekhi, Hoda; Habibi, Jafar; Khalafbeigi, Tania; Voulgaris, Spyros; van Steen, Martinus Richardus
In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Institute of Scientific and Technical Information of China (English)
徐斌; 张玉峰
2011-01-01
Usually used in critical text mining method is used to adopt supervised learning algorithms, but supervised learning algorithms require significant manual labor marked the training set, and its processing in the text set will face dimension disaster and sparse vector space complexity, high precision and low recall problems and cannot be used for mass text polarity classification task. The classic K-means clustering algorithm is used in the cluster analysis algorithm is one of the most widely, it has many excellent properties and insufficient. In view of the above situation, this article will introduce semantic mean to classical K-means clustering algorithm, constructed specifically for Chinese comment text polarity of the righteous judgment, polarity word dictionary is proposed based on the semantic principles function of K-means clustering algorithm. The study is the use of semantic clustering method based on Chinese texts deal with a subjective exploration.Experimental results show that the total average recall rate reached 80.70％ , total average accurate rate reached 67.75％ ,show that the algorithm is feasible and effective.%通常用于评论性文本极性挖掘的方法是采用有监督的学习算法完成的,但有监督的学习算法需要大量人工标注的训练集,而且其在处理文本集时还会面临维数灾难、稀疏向量、高时空复杂度、低召回率和精确率等问题而无法用于海量的文本极性分类任务.经典的K-means均值聚类算法是聚类分析中使用最为广泛的算法之一,其具有诸多的优良特性和不足.针对上述情况,本文将语义引入经典K-means均值聚类算法中,构造了专门针对中文评论文本极性判断的极性词语义词典,提出了一种基于语义准则函数的K-means均值聚类算法.这项研究是运用基于语义的聚类方法对汉语主观性文本处理的一次探索.实验结果显示总平均召回率达到了80.70%,
Robustness of the ATLAS pixel clustering neural network algorithm
AUTHOR|(INSPIRE)INSPIRE-00407780; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. In the ATLAS track reconstruction algorithm, an artificial neural network is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The robustness of the neural network algorithm is presented, probing its sensitivity to uncertainties in the detector conditions. The robustness is studied by evaluating the stability of the algorithm's performance under a range of variations in the inputs to the neural networks. Within reasonable variation magnitudes, the neural networks prove to be robust to most variation types.
World Wide Web Metasearch Clustering Algorithm
Directory of Open Access Journals (Sweden)
Adina LIPAI
2008-01-01
Full Text Available As the storage capacity and the processing speed of search engine is growing to keep up with the constant expansion of the World Wide Web, the user is facing an increasing list of results for a given query. A simple query composed of common words sometimes have hundreds even thousands of results making it practically impossible for the user to verify all of them, in order to identify a particular site. Even when the list of results is presented to the user ordered by a rank, most of the time it is not sufficient support to help him identify the most relevant sites for his query. The concept of search result clustering was introduced as a solution to this situation. The process of clustering search results consists of building up thematically homogenous groups from the initial list results provided by classic search tools, and using up characteristics present within the initial results, without any kind of predefined categories.
Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm
Directory of Open Access Journals (Sweden)
M. Manikantan
2015-02-01
Full Text Available Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining process and a common technique for grouping the web search results into a different category based on the specific web contents. A clustering search engine called Lingo used only snippets to cluster the documents. Though this method takes less time to cluster the documents, it could not be able to produce the clusters of good quality. This study focuses on clustering all documents using by applying semantic similarity between words and then by applying modified lingo algorithm in less time and produce good quality.
AN IMPROVED FUZZY CLUSTERING ALGORITHM FOR MICROARRAY IMAGE SPOTS SEGMENTATION
Directory of Open Access Journals (Sweden)
V.G. Biju
2015-11-01
Full Text Available An automatic cDNA microarray image processing using an improved fuzzy clustering algorithm is presented in this paper. The spot segmentation algorithm proposed uses the gridding technique developed by the authors earlier, for finding the co-ordinates of each spot in an image. Automatic cropping of spots from microarray image is done using these co-ordinates. The present paper proposes an improved fuzzy clustering algorithm Possibility fuzzy local information c means (PFLICM to segment the spot foreground (FG from background (BG. The PFLICM improves fuzzy local information c means (FLICM algorithm by incorporating typicality of a pixel along with gray level information and local spatial information. The performance of the algorithm is validated using a set of simulated cDNA microarray images added with different levels of AWGN noise. The strength of the algorithm is tested by computing the parameters such as the Segmentation matching factor (SMF, Probability of error (pe, Discrepancy distance (D and Normal mean square error (NMSE. SMF value obtained for PFLICM algorithm shows an improvement of 0.9 % and 0.7 % for high noise and low noise microarray images respectively compared to FLICM algorithm. The PFLICM algorithm is also applied on real microarray images and gene expression values are computed.
Application of genetic algorithms to hydrogenated silicon clusters
Indian Academy of Sciences (India)
N Chakraborti; R Prasad
2003-01-01
We discuss the application of biologically inspired genetic algorithms to determine the ground state structures of a number of Si–H clusters. The total energy of a given configuration of a cluster has been obtained by using a non-orthogonal tight-binding model and the energy minimization has been carried out by using genetic algorithms and their recent variant differential evolution. Our results for ground state structures and cohesive energies for Si–H clusters are in good agreement with the earlier work conducted using the simulated annealing technique. We find that the results obtained by genetic algorithms turn out to be comparable and often better than the results obtained by the simulated annealing technique.
Spin chain simulations with a meron cluster algorithm
Energy Technology Data Exchange (ETDEWEB)
Boyer, T. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Ecole Normale Superieure de Cachan (France); Bietenholz, W. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany). John von Neumann-Inst. fuer Computing NIC; Wuilloud, J. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Geneve Univ. (Switzerland). Dept. de Physique Theorique
2007-01-15
We apply a meron cluster algorithm to the XY spin chain, which describes a quantum rotor. This is a multi-cluster simulation supplemented by an improved estimator, which deals with objects of half-integer topological charge. This method is powerful enough to provide precise results for the model with a {theta}-term - it is therefore one of the rare examples, where a system with a complex action can be solved numerically. In particular we measure the correlation length, as well as the topological and magnetic susceptibility. We discuss the algorithmic efficiency in view of the critical slowing down. Due to the excellent performance that we observe, it is strongly motivated to work on new applications of meron cluster algorithms in higher dimensions. (orig.)
Adaptive Weighted Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Adwan Yasin
2016-04-01
Full Text Available In this paper we present a new algorithm for clustering MANET by considering several parameters. This is a new adaptive load balancing technique for clustering out Mobile Ad-hoc Networks (MANET. MANET is special kind of wireless networks where no central management exits and the nodes in the network cooperatively manage itself and maintains connectivity. The algorithm takes into account the local capabilities of each node, the remaining battery power, degree of connectivity and finally the power consumption based on the average distance between nodes and candidate cluster head. The proposed algorithm efficiently decreases the overhead in the network that enhances the overall MANET performance. Reducing the maintenance time of broken routes makes the network more stable, reliable. Saving the power of the nodes also guarantee consistent and reliable network.
Quantum algorithms for testing Boolean functions
Erika Andersson; Floess, Dominik F.; Mark Hillery
2010-01-01
We discuss quantum algorithms, based on the Bernstein-Vazirani algorithm, for finding which variables a Boolean function depends on. There are 2^n possible linear Boolean functions of n variables; given a linear Boolean function, the Bernstein-Vazirani quantum algorithm can deterministically identify which one of these Boolean functions we are given using just one single function query. The same quantum algorithm can also be used to learn which input variables other types of Boolean functions...
A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis
Directory of Open Access Journals (Sweden)
Shaoning Li
2017-01-01
Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.
Energy Efficient Homogenous Clustering and Cluster Head Selection Algorithm for WSN
Directory of Open Access Journals (Sweden)
Ganeshayya I. Shidaganti
2013-02-01
Full Text Available Wireless sensor networks (WSNs are energy and resource constrained networks, which are made up of small electronic devices called sensor nodes. Each sensor nodes are capable of sensing, computing and transmitting data from one node to another, till to reach base station. Each node monitors physical or environmental conditions, depending on application and communicate with nearby nodes via radio broadcast. Radio transmission and reception consumes a lot of energy in a wireless sensor network (WSN, thus, one of the important issues in wireless sensor network is the inherent limited battery power within the sensor nodes. Therefore, battery power is crucial parameter in the algorithm design in maximizing the lifespan of sensor nodes. Much research has been done in recent years in the area of low power routing protocol, but there are still many design options open for improvement and for further research targeted to the specific applications need to be done. In this paper, we propose a new approach of an energy-efficient homogeneous clustering and cluster head selection algorithm for wireless sensor networks in which the lifespan of the network is increased by ensuring a homogeneous distribution of nodes in the clusters. In this clustering algorithm, energy efficiency is distributed and network performance is improved by selecting cluster heads on the basis of the residual energy of existing cluster heads, holdback value, and nearest hop distance of the node. In the proposed clustering algorithm, the cluster members are uniformly distributed and the life of the network is further extended
Clustering of Customers Based on Shopping Behavior and Employing Genetic Algorithms
Directory of Open Access Journals (Sweden)
E. P. Bafghi
2017-02-01
Full Text Available Clustering of customers is a vital case in marketing and customer relationship management. In traditional marketing, a market seller is categorized based on general characteristics like clients’ statistical information and their lifestyle features. However, this method seems unable to cope with today’s challenges. In this paper, we present a method for the classification of customers based on variables such as shopping cases and financial information related to the customers’ interactions. One measure of similarity was defined as clustering and clustering quality function was further defined. Genetic algorithms been used to ensure the accuracy of clustering.
New two-dimensional fuzzy C-means clustering algorithm for image segmentation
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
To solve the problem of poor anti-noise performance of the traditional fuzzy C-means (FCM) algorithm in image segmentation,a novel two-dimensional FCM clustering algorithm for image segmentation was proposed.In this method,the image segmentation was converted into an optimization problem.The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixcls described by the improved two-dimensional histogram.By making use of the global searching ability of the predator-prey particle swarm optimization,the optimal cluster center could be obtained by iterative optimization,and the image segmentation could be accomplished.The simulation results show that the segmentation accuracy ratio of the proposed method is above 99%.The proposed algorithm has strong anti-noise capability,high clustering accuracy and good segment effect,indicating that it is an effective algorithm for image segmentation.
NCUBE - A clustering algorithm based on a discretized data space
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
A Rough Set based Gene Expression Clustering Algorithm
Directory of Open Access Journals (Sweden)
J. J. Emilyn
2011-01-01
Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.
Core Business Selection Based on Ant Colony Clustering Algorithm
Directory of Open Access Journals (Sweden)
Yu Lan
2014-01-01
Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.
Research on Scheduling Algorithms in Web Cluster Servers
Institute of Scientific and Technical Information of China (English)
LEI YingChun (雷迎春); GONG YiLi (龚奕利); ZHANG Song (张松); LI GuoJie (李国杰)
2003-01-01
This paper analyzes quantitatively the impact of the load balance scheduling algorithms and the locality scheduling algorithms on the performance of Web cluster servers, and brings forward the Adaptive_LARD algorithm. Compared with the representative LARD algorithm, the advantages of the Adaptive_LARD are that: (1) it adjusts load distribution among the back-ends through the idea of load balancing to avoid learning steps in the LARD algorithm and reinforce its adaptability; (2) by distinguishing between TCP connections accessing disks and those accessing cache memory, it can estimate the impact of different connections on the back-ends' load more precisely. Performance evaluations suggest that the proposed method outperforms the LARD algorithm by up to 14.7%.
Identifying multiple influential spreaders by a heuristic clustering algorithm
Energy Technology Data Exchange (ETDEWEB)
Bao, Zhong-Kui [School of Mathematical Science, Anhui University, Hefei 230601 (China); Liu, Jian-Guo [Data Science and Cloud Service Research Center, Shanghai University of Finance and Economics, Shanghai, 200133 (China); Zhang, Hai-Feng, E-mail: haifengzhang1978@gmail.com [School of Mathematical Science, Anhui University, Hefei 230601 (China); Department of Communication Engineering, North University of China, Taiyuan, Shan' xi 030051 (China)
2017-03-18
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.
Chaos control of ferroresonance system based on RBF-maximum entropy clustering algorithm
Energy Technology Data Exchange (ETDEWEB)
Liu Fan [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China)]. E-mail: liufan2003@yahoo.com.cn; Sun Caixin [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Sima Wenxia [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Liao Ruijin [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Guo Fei [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China)
2006-09-11
With regards to the ferroresonance overvoltage of neutral grounded power system, a maximum-entropy learning algorithm based on radial basis function neural networks is used to control the chaotic system. The algorithm optimizes the object function to derive learning rule of central vectors, and uses the clustering function of network hidden layers. It improves the regression and learning ability of neural networks. The numerical experiment of ferroresonance system testifies the effectiveness and feasibility of using the algorithm to control chaos in neutral grounded system.
A Genetic Clustering Algorithm for Mean-Residual Vector Quantization
Institute of Scientific and Technical Information of China (English)
CHUShuchuan; JohnF.Roddick; CHENTsongyi
2004-01-01
Vector quantization (VQ) is a useful tool for data compression and can be applied to compress the data vectors in the database. The quality of the recovered data vector depends on a good codebook. Meanresidual vector quantization (M/R VQ) has been shown to be efficient in the encoding time and it only needs a little storage. In this paper, genetic algorithms in combination with the Generalized lloyd algorithm (GLA) are applied to the codebook design of M/R VQ. The mean codebook and residual codebook are trained using GLA algorithm separately, then Genetic algorithms (GA) are used to evaluate and evolve the combined mean codebook and residual codebook. The parameters used in the proposed algorithm are designed based on experiments and they are robust to the proposed GA based clustering algorithm for M/R VQ. Experimental results demonstrate the proposed genetic clustering algorithm applied to M/R VQ may improve the peak signal to noise ratio of the recovered data vector compared with the GLA algorithm.
Evaluation of clustering algorithms for gene expression data using gene ontology annotations
Institute of Scientific and Technical Information of China (English)
MA Ning; ZHANG Zheng-guo
2012-01-01
Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.
A Task-parallel Clustering Algorithm for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
Summarizing Relational Data Using Semi-Supervised Genetic Algorithm-Based Clustering Techniques
Directory of Open Access Journals (Sweden)
Rayner Alfred
2010-01-01
Full Text Available Problem statement: In solving a classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data transformation becomes a tedious trial-and-error work and the classification result is often not very promising especially when the number of tables and the degree of one-to-many association are large. Approach: We proposed a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables to facilitate the task of solving a classification problem in relational database. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations. It can be used in two ways. One is user-controlled clustering, where the user may control the result of clustering by varying the compactness of the spherical cluster. The other is automatic clustering, where a non-overlap clustering strategy is applied. In this study, we use the latter method to dynamically cluster multiple instances, as a means of aggregating them and illustrate the effectiveness of this method using the semi-supervised genetic algorithm-based clustering technique. Results: It was shown in the experimental results that using the reciprocal of Davies-Bouldin Index for cluster dispersion and the reciprocal of Gini Index for cluster purity, as the fitness function in the Genetic Algorithm (GA, finds solutions with much greater accuracy. The results obtained in this study showed that automatic clustering (seeding, by optimizing the cluster dispersion or cluster purity alone using GA, provides one with good results compared to the traditional k-means clustering. However, the best result can be achieved by optimizing the combination values of both the cluster
Dynamic Head Cluster Election Algorithm for Clustered Ad-Hoc Networks
Directory of Open Access Journals (Sweden)
Arwa Zabian
2008-01-01
Full Text Available In distributed system, the concept of clustering consists on dividing the geographical area covered by a set of nodes into small zones. In mobile network, the clustering mechanism varied due to the mobility of the nodes any time in any direction. That causes the partitioning of the network or the joining of nodes. Several existing centralized or globalized algorithm have been proposed for clustering technique, in a manner that no one node becomes isolated and no cluster becomes overloaded. A particular node called head cluster or leader is elected, has the role to organize the distribution of nodes in clusters. We propose a distributed clustering and leader election mechanism for Ad-Hoc mobile networks, in which the leader is a mobile node. Our results show that, in the case of leader mobility the time needed to elect a new leader is smaller than the time needed a significant topological change in the network is happens.
Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem
Davendra, Donald; Zelinka, Ivan; Senkerik, Roman
2009-08-01
An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.
Blockspin Scheme and Cluster Algorithm for Quantum Spin Systems
Ying, H P; Ying, He-Ping; Wiese, Uwe-Jens
1992-01-01
We present a numerical study using a cluster algorithm for the 1-d $S=1/2$ quantum Heisenberg models. The dynamical critical exponent for anti-ferromagnetic chains is $z=0.0(1)$ such that critical slowing down is eliminated.
Clustering algorithms for Stokes space modulation format recognition
DEFF Research Database (Denmark)
Boada, Ricard; Borkowski, Robert; Tafur Monroy, Idelfonso
2015-01-01
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Energy Technology Data Exchange (ETDEWEB)
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster
A Survey on Clustering Algorithms for Heterogeneous Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Vivek Katiyar
2011-01-01
Full Text Available Potential use of wireless sensor networks (WSNs can be seen in various fields like disaster management, battle field surveillance and border security surveillance since last few years. In such applications, a large number of sensor nodes are deployed, which are often unattended and work autonomously. Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. It can also increase network scalability. Sensor nodes are considered to be homogeneous since the researches in the field of WSNs have been evolved, but some nodes may be of different energy to prolong the lifetime of a WSN and its reliability. In this paper, we study the impact of heterogeneity of nodes to the performance of WSNs. This paper surveys different clustering algorithms for heterogeneous WSNs by classifying algorithms depending upon various clustering attributes.
A HYBRID HEURISTIC ALGORITHM FOR THE CLUSTERED TRAVELING SALESMAN PROBLEM
Directory of Open Access Journals (Sweden)
Mário Mestria
2016-04-01
Full Text Available ABSTRACT This paper proposes a hybrid heuristic algorithm, based on the metaheuristics Greedy Randomized Adaptive Search Procedure, Iterated Local Search and Variable Neighborhood Descent, to solve the Clustered Traveling Salesman Problem (CTSP. Hybrid Heuristic algorithm uses several variable neighborhood structures combining the intensification (using local search operators and diversification (constructive heuristic and perturbation routine. In the CTSP, the vertices are partitioned into clusters and all vertices of each cluster have to be visited contiguously. The CTSP is -hard since it includes the well-known Traveling Salesman Problem (TSP as a special case. Our hybrid heuristic is compared with three heuristics from the literature and an exact method. Computational experiments are reported for different classes of instances. Experimental results show that the proposed hybrid heuristic obtains competitive results within reasonable computational time.
An Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2005-01-01
We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.
Evaluation of clustering algorithms for protein-protein interaction networks
Directory of Open Access Journals (Sweden)
van Helden Jacques
2006-11-01
Full Text Available Abstract Background Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism. In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies. High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL, Restricted Neighborhood Search Clustering (RNSC, Super Paramagnetic Clustering (SPC, and Molecular Complex Detection (MCODE. Results A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. Conclusion This
A heuristic approach to possibilistic clustering algorithms and applications
Viattchenin, Dmitri A
2013-01-01
The present book outlines a new approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy cluster and the possibilistic memberships are determined directly from the values of the pairwise similarity of objects. The proposed approach can be used for solving different classification problems. Here, some techniques that might be useful at this purpose are outlined, including a methodology for constructing a set of labeled objects for a semi-supervised clustering algorithm, a methodology for reducing analyzed attribute space dimensionality and a methods for asymmetric data processing. Moreover, a technique for constructing a subset of the most appropriate alternatives for a set of weak fuzzy preference relations, which are defined on a universe of alternatives, is described in detail, and a method for rapidly prototyping the Mamdani’s fuzzy inference systems is introduced. This book addresses engineers, scientist...
High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm
Directory of Open Access Journals (Sweden)
Dieter Hendricks
2016-02-01
Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.
A comparison of clustering algorithms in article recommendation system
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A Clustering Genetic Algorithm for Cylinder Drag Optimization
Milano, Michele; Koumoutsakos, Petros
2002-01-01
A real coded genetic algorithm is implemented for the optimization of actuator parameters for cylinder drag minimization. We consider two types of idealized actuators that are allowed either to move steadily and tangentially to the cylinder surface (“belts”) or to steadily blow/suck with a zero net mass constraint. The genetic algorithm we implement has the property of identifying minima basins, rather than single optimum points. The knowledge of the shape of the minimum basin enables further insights into the system properties and provides a sensitivity analysis in a fully automated way. The drag minimization problem is formulated as an optimal regulation problem. By means of the clustering property of the present genetic algorithm, a set of solutions producing drag reduction of up to 50% is identified. A comparison between the two types of actuators, based on the clustering property of the algorithm, indicates that blowing/suction actuation parameters are associated with larger tolerances when compared to optimal parameters for the belt actuators. The possibility of using a few strategically placed actuators to obtain a significant drag reduction is explored using the clustering diagnostics of this method. The optimal belt-actuator parameters obtained by optimizing the two-dimensional case is employed in three-dimensional simulations, by extending the actuators across the span of the cylinder surface. The three-dimensional controlled flow exhibits a strong two-dimensional character near the cylinder surface, resulting in significant drag reduction.
Robustness of the ATLAS pixel clustering neural network algorithm
Sidebo, Per Edvin; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. The algorithms depend heavily on accurate estimation of the position of particles as they traverse the inner detector elements. An artificial neural network algorithm is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The method recovers otherwise lost tracks in dense environments where particles are separated by distances comparable to the size of the detector read-out elements. Such environments are highly relevant for LHC run 2, e.g. in searches for heavy resonances. Within the scope of run 2 track reconstruction performance and upgrades, the robustness of the neural network algorithm will be presented. The robustness has been studied by evaluating the stability of the algorithm’s performance under a range of variations in the pixel detector conditions.
Comparative Study of Clustering Algorithms in Text Mining Context
Directory of Open Access Journals (Sweden)
Abdennour Mohamed Jalil
2016-06-01
Full Text Available The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% of world population using internet [1]; have created a problem related of the processing of the data exchanged, which is rising exponentially and that should be automatically treated. This paper presents a classical process of knowledge discovery databases, in order to treat textual data. This process is divided into three parts: preprocessing, processing and post-processing. In the processing step, we present a comparative study between several clustering algorithms such as KMeans, Global KMeans, Fast Global KMeans, Two Level KMeans and FWKmeans. The comparison between these algorithms is made on real textual data from the web using RSS feeds. Experimental results identified two problems: the first one quality results which remain for algorithms, which rapidly converge. The second problem is due to the execution time that needs to decrease for some algorithms.
Energy functions for regularization algorithms
Delingette, H.; Hebert, M.; Ikeuchi, K.
1991-01-01
Regularization techniques are widely used for inverse problem solving in computer vision such as surface reconstruction, edge detection, or optical flow estimation. Energy functions used for regularization algorithms measure how smooth a curve or surface is, and to render acceptable solutions these energies must verify certain properties such as invariance with Euclidean transformations or invariance with parameterization. The notion of smoothness energy is extended here to the notion of a differential stabilizer, and it is shown that to void the systematic underestimation of undercurvature for planar curve fitting, it is necessary that circles be the curves of maximum smoothness. A set of stabilizers is proposed that meet this condition as well as invariance with rotation and parameterization.
A Hybrid LBFGS-DE Algorithm for Global Optimization of the Lennard-Jones Cluster Problem
Directory of Open Access Journals (Sweden)
Ernesto Padernal Adorio
2004-12-01
Full Text Available The Lennard-Jones cluster conformation problem is to determine a configuration of n atoms in three-dimensional space where the sum of the nonlinear pairwise potential function is at a minimum. In this formula, ri,j is the distance between atoms i and j. This optimization problem is a severe test for global optimization algorithms due to its computational complexity: the number of local minima grows exponentially large as the number of atoms in the cluster is increased. As a specific test case, a better cluster configuration than the previously published putative minimum for the 38-atom case was found in the mid-1990s.
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
Exploring New Clustering Algorithms for the CMS Tracker FED
Gamboa Alvarado, Jose Leandro
2013-01-01
In the current Front End (FE) firmware clusters of hits within the APV frames are found using a simple threshold comparison (which is made between the data and a 3 or 5 sigma strip noise cut) on reordered pedestal and Common Mode (CM) noise subtracted data. In addition the CM noise subtraction requires the baseline of each APV frame to be approximately uniform. Therefore, the current algorithm will fail if the APV baseline exhibits large-scale non-uniform behavior. Under very high luminosity conditions the assumption of a uniform APV baseline breaks down and the FED is unable to maintain a high efficiency of cluster finding. \
Mapping cultivable land from satellite imagery with clustering algorithms
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Advanced defect detection algorithm using clustering in ultrasonic NDE
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Directory of Open Access Journals (Sweden)
Yuzhong Chen
2016-02-01
Full Text Available Vehicular ad hoc networks (VANETs have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
Directory of Open Access Journals (Sweden)
Hanane FROUD
2013-11-01
Full Text Available Document Clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. The useful expressions in the documents is often accompanied by a large amount of noise that is caused by the use of unnecessary words, so it is indispensable to eliminate it and keeping just the useful information. Keyphrases extraction systems in Arabic are new phenomena. A number of Text Mining applications can use it to improve her results. The Keyphrases are defined as phrases that capture the main topics discussed in document; they offer a brief and precise summary of document content. Therefore, it can be a good solution to get rid of the existent noise from documents. In this paper, we propose a new method to solve the problem cited above especially for Arabic language documents, which is one of the most complex languages, by using a new Keyphrases extraction algorithm based on the Suffix Tree data structure (KpST. To evaluate our approach, we conduct an experimental study on Arabic Documents Clustering using the most popular approach of Hierarchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a variety of distance functions and similarity measures to perform Arabic Document Clustering task. The obtained results show that our approach for extracting Keyphrases improves the clustering results.
Core Business Selection Based on Ant Colony Clustering Algorithm
Yu Lan; Yan Bo; Yao Baozhen
2014-01-01
Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant c...
One cutting plane algorithm using auxiliary functions
Zabotin, I. Ya; Kazaeva, K. E.
2016-11-01
We propose an algorithm for solving a convex programming problem from the class of cutting methods. The algorithm is characterized by the construction of approximations using some auxiliary functions, instead of the objective function. Each auxiliary function bases on the exterior penalty function. In proposed algorithm the admissible set and the epigraph of each auxiliary function are embedded into polyhedral sets. In connection with the above, the iteration points are found by solving linear programming problems. We discuss the implementation of the algorithm and prove its convergence.
Optimized algorithm for balancing clusters in wireless sensor networks
Institute of Scientific and Technical Information of China (English)
Mucheol KIM; Sun-hong KIM; Hyungjin BYUN; Sang-yong HAN
2009-01-01
Wireless sensor networks consist of hundreds or thousands of sensor nodes that involve numerous restrictions including computation capability and battery capacity. Topology control is an important issue for achieving a balanced placement of sensor nodes. The clustering scheme is a widely known and efficient means of topology control for transmitting information to the base station in two hops. The automatic routing scheme of the self-organizing technique is another critical element of wireless sensor networks. In this paper we propose an optimal algorithm with cluster balance taken into consideration, and compare it with three well known and widely used approaches, I.e., LEACH, MEER, and VAP-E, in performance evaluation. Experimental results show that the proposed approach increases the overall network lifetime, indicating that the amount of energy required for communication to the base station will be reduced for locating an optimal cluster.
A cluster analysis on road traffic accidents using genetic algorithms
Saharan, Sabariah; Baragona, Roberto
2017-04-01
The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.
Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion
Directory of Open Access Journals (Sweden)
Jin Qi
2015-01-01
Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.
Quantum algorithms for testing Boolean functions
Directory of Open Access Journals (Sweden)
Erika Andersson
2010-06-01
Full Text Available We discuss quantum algorithms, based on the Bernstein-Vazirani algorithm, for finding which variables a Boolean function depends on. There are 2^n possible linear Boolean functions of n variables; given a linear Boolean function, the Bernstein-Vazirani quantum algorithm can deterministically identify which one of these Boolean functions we are given using just one single function query. The same quantum algorithm can also be used to learn which input variables other types of Boolean functions depend on, with a success probability that depends on the form of the Boolean function that is tested, but does not depend on the total number of input variables. We also outline a procedure to futher amplify the success probability, based on another quantum algorithm, the Grover search.
You, Tao; Cheng, Hui-Min; Ning, Yi-Zi; Shia, Ben-Chang; Zhang, Zhong-Yuan
2016-12-01
Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in the network. To solve the problem, a new community detection method (named IsoFdp) is proposed in this paper. First, we use IsoMap technique to map the network data into a low dimensional manifold which can reveal diverse pair-wised similarity. Then Fdp is applied to detect the communities in the network. An improved partition density function is proposed to select the proper number of communities automatically. We test our method on both synthetic and real-world networks, and the results demonstrate the effectiveness of our algorithm over the state-of-the-art methods.
Identifying multiple influential spreaders by a heuristic clustering algorithm
Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng
2017-03-01
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very "negligible". Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant.
The Geometric Cluster Algorithm: Rejection-Free Monte Carlo Simulation of Complex Fluids
Luijten, Erik
2005-03-01
The study of complex fluids is an area of intense research activity, in which exciting and counter-intuitive behavior continue to be uncovered. Ironically, one of the very factors responsible for such interesting properties, namely the presence of multiple relevant time and length scales, often greatly complicates accurate theoretical calculations and computer simulations that could explain the observations. We have recently developed a new Monte Carlo simulation methodootnotetextJ. Liu and E. Luijten, Phys. Rev. Lett.92, 035504 (2004); see also Physics Today, March 2004, pp. 25--27. that overcomes this problem for several classes of complex fluids. Our approach can accelerate simulations by orders of magnitude by introducing nonlocal, collective moves of the constituents. Strikingly, these cluster Monte Carlo moves are proposed in such a manner that the algorithm is rejection-free. The identification of the clusters is based upon geometric symmetries and can be considered as the off-latice generalization of the widely-used Swendsen--Wang and Wolff algorithms for lattice spin models. While phrased originally for complex fluids that are governed by the Boltzmann distribution, the geometric cluster algorithm can be used to efficiently sample configurations from an arbitrary underlying distribution function and may thus be applied in a variety of other areas. In addition, I will briefly discuss various extensions of the original algorithm, including methods to influence the size of the clusters that are generated and ways to introduce density fluctuations.
Local rewiring algorithms to increase clustering and grow a small world
Alstott, Jeff; Pizza, Pamela B; Radcliffe, Mary
2016-01-01
Many real-world networks have high clustering among vertices: vertices that share neighbors are often also directly connected to each other. A network's clustering can be a useful indicator of its connectedness and community structure. Algorithms for generating networks with high clustering have been developed, but typically rely on adding or removing edges and nodes, sometimes from a completely empty network. Here, we introduce algorithms that create a highly clustered network by starting with an existing network and rearranging edges, without adding or removing them; these algorithms can preserve other network properties even as the clustering increases. These algorithms rely on local rewiring rules, in which a single edge changes one of its vertices in a way that is guaranteed to increase clustering. This greedy algorithm can be applied iteratively to transform a random network into a form with much higher clustering. Additionally, these algorithms grow the network's clustering faster than they increase it...
Sweeney, Timothy E; Chen, Albert C; Gevaert, Olivier
2015-11-19
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
Lyapunov Function Synthesis - Algorithm and Software
DEFF Research Database (Denmark)
Leth, Tobias; Wisniewski, Rafal; Sloth, Christoffer
2016-01-01
In this paper we introduce an algorithm for the synthesis of polynomial Lyapunov functions for polynomial vector fields. The Lyapunov function is a continuous piecewisepolynomial defined on simplices, which compose a collection of simplices. The algorithm is elaborated and crucial features...
Lyapunov Function Synthesis - Algorithm and Software
DEFF Research Database (Denmark)
Leth, Tobias; Sloth, Christoffer; Wisniewski, Rafal
2016-01-01
In this paper we introduce an algorithm for the synthesis of polynomial Lyapunov functions for polynomial vector fields. The Lyapunov function is a continuous piecewisepolynomial defined on simplices, which compose a collection of simplices. The algorithm is elaborated and crucial features...
A Flow-Partitioned Unequal Clustering Routing Algorithm for Wireless Sensor Networks
Jian Peng; Xiaohai Chen; Tang Liu
2014-01-01
Energy efficiency and energy balance are two important issues for wireless sensor networks. In previous clustering routing algorithms, multihop transmission, sleep scheduling, and unequal clustering are always used to improve energy efficiency and energy balance. In these algorithms, only the cluster heads share the burden of data forwarding in each round. In this paper, we propose a flow-partitioned unequal clustering routing (FPUC) algorithm to achieve better energy efficiency and energy ba...
Development of Automatic Cluster Algorithm for Microcalcification in Digital Mammography
Energy Technology Data Exchange (ETDEWEB)
Choi, Seok Yoon [Dept. of Medical Engineering, Korea University, Seoul (Korea, Republic of); Kim, Chang Soo [Dept. of Radiological Science, College of Health Sciences, Catholic University of Pusan, Pusan (Korea, Republic of)
2009-03-15
Digital Mammography is an efficient imaging technique for the detection and diagnosis of breast pathological disorders. Six mammographic criteria such as number of cluster, number, size, extent and morphologic shape of microcalcification, and presence of mass, were reviewed and correlation with pathologic diagnosis were evaluated. It is very important to find breast cancer early when treatment can reduce deaths from breast cancer and breast incision. In screening breast cancer, mammography is typically used to view the internal organization. Clusterig microcalcifications on mammography represent an important feature of breast mass, especially that of intraductal carcinoma. Because microcalcification has high correlation with breast cancer, a cluster of a microcalcification can be very helpful for the clinical doctor to predict breast cancer. For this study, three steps of quantitative evaluation are proposed : DoG filter, adaptive thresholding, Expectation maximization. Through the proposed algorithm, each cluster in the distribution of microcalcification was able to measure the number calcification and length of cluster also can be used to automatically diagnose breast cancer as indicators of the primary diagnosis.
Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm
Directory of Open Access Journals (Sweden)
S.Padmaja
2016-02-01
Full Text Available The proposed work does an improved K-means clustering algorithm for identifying internet user behaviour. Web data analysis includes the transformation and interpretation of web log data find out the information, patterns and knowledge discovery. The efficiency of the algorithm is analyzed by considering certain parameters. The parameters are date, time, S_id, CS_method, C_IP, User_agent and time taken. The research done by using more than 2 years of real data set collected from two different group of institutions web server .this dataset provides a better analysis of Log data to identify internet user behaviour.
Clustering aspects in nuclear structure functions
Hirai, M; Saito, K; Watanabe, T
2010-01-01
For understanding an anomalous nuclear effect experimentally observed for the beryllium-9 nucleus at the Thomas Jefferson National Accelerator Facility (JLab), clustering aspects are studied in structure functions of deep inelastic lepton-nucleus scattering by using momentum distributions calculated in antisymmetrized (or fermionic) molecular dynamics (AMD) and also in a simple shell model for comparison. According to the AMD, the Be-9 nucleus consists of two alpha-like clusters with a surrounding neutron. The clustering produces high-momentum components in nuclear wave functions, which affects nuclear modifications of the structure functions. We investigated whether clustering features could appear in the structure function F_2 of Be-9 along with studies for other light nuclei. We found that nuclear modifications of F_2 are similar in both AMD and shell models within our simple convolution description although there are slight differences in Be-9. It indicates that the anomalous Be-9 result should be explain...
Directory of Open Access Journals (Sweden)
Liling Sun
2015-01-01
Full Text Available An improved multiobjective ABC algorithm based on K-means clustering, called CMOABC, is proposed. To fasten the convergence rate of the canonical MOABC, the way of information communication in the employed bees’ phase is modified. For keeping the population diversity, the multiswarm technology based on K-means clustering is employed to decompose the population into many clusters. Due to each subcomponent evolving separately, after every specific iteration, the population will be reclustered to facilitate information exchange among different clusters. Application of the new CMOABC on several multiobjective benchmark functions shows a marked improvement in performance over the fast nondominated sorting genetic algorithm (NSGA-II, the multiobjective particle swarm optimizer (MOPSO, and the multiobjective ABC (MOABC. Finally, the CMOABC is applied to solve the real-world optimal power flow (OPF problem that considers the cost, loss, and emission impacts as the objective functions. The 30-bus IEEE test system is presented to illustrate the application of the proposed algorithm. The simulation results demonstrate that, compared to NSGA-II, MOPSO, and MOABC, the proposed CMOABC is superior for solving OPF problem, in terms of optimization accuracy.
Clustering Algorithms for Heterogeneous Wireless Sensor Networks - A Brief Survey
Directory of Open Access Journals (Sweden)
A.MeenaKowshalya
2011-09-01
Full Text Available Wireless sensor networks (WSN are emerging in vari ous fields like disaster management, battle field surveillance and border security surveillance. A la rge number of sensors in these applications are unattended and work autonomously. Clustering is a k ey technique to improve the network lifetime, reduc e the energy consumption and increase the scalability of the sensor network. In this paper, we study the impact of heterogeneity of the nodes to the perform ance of WSN. This paper surveys the different clust ering algorithm for heterogeneous WSN .
Classification of posture maintenance data with fuzzy clustering algorithms
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Cluster-Based Distributed Algorithms for Very Large Linear Equations
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n3/N), the memory cost is O(n2/N), the I/O cost is O(n2/N), and the communication cost is O(Nn), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 106×106 effectively.
Dynamic and static properties of the invaded cluster algorithm
Moriarty, K.; Machta, J.; Chayes, L. Y.
1999-02-01
Simulations of the two-dimensional Ising and three-state Potts models at their critical points are performed using the invaded cluster (IC) algorithm. It is argued that observables measured on a sublattice of size l should exhibit a crossover to Swendsen-Wang (SW) behavior for l sufficiently less than the lattice size L, and a scaling form is proposed to describe the crossover phenomenon. It is found that the energy autocorrelation time τɛ(l,L) for an l×l sublattice attains a maximum in the crossover region, and a dynamic exponent zIC for the IC algorithm is defined according to τɛ,max~LzIC. Simulation results for the three-state model yield zIC=0.346+/-0.002, which is smaller than values of the dynamic exponent found for the SW and Wolff algorithms and also less than the Li-Sokal bound. The results are less conclusive for the Ising model, but it appears that zICWolff algorithms.
Gusev, Alexander; Chuluunbaatar, Ochbadrakh; Rostovtsev, Vitaly; Hai, Luong Le; Derbov, Vladimir; Gozdz, Andrzej; Klimov, Evgenii
2013-01-01
The quantum model of a cluster, consisting of A identical particles, coupled by the internal pair interactions and affected by the external field of a target, is considered. A symbolic-numerical algorithm for generating A-1-dimensional oscillator eigenfunctions, symmetric or antisymmetric with respect to permutations of A identical particles in the new symmetrized coordinates, is formulated and implemented using the MAPLE computer algebra system. Examples of generating the symmetrized coordinate representation for A-1 dimensional oscillator functions in one-dimensional Euclidean space are analyzed. The approach is aimed at solving the problem of tunnelling the clusters, consisting of several identical particles, through repulsive potential barriers of a target.
Cerebellar Functional Parcellation Using Sparse Dictionary Learning Clustering
Directory of Open Access Journals (Sweden)
Changqing eWang
2016-05-01
Full Text Available The human cerebellum has recently been discovered to contribute to cognition and emotion beyond the planning and execution of movement, suggesting its functional heterogeneity. We aimed to identify the functional parcellation of the cerebellum using information from resting-state functional magnetic resonance imaging (rs-fMRI. For this, we introduced a new data-driven decomposition-based functional parcellation algorithm, called Sparse Dictionary Learning Clustering (SDLC. SDLC integrates dictionary learning, sparse representation of rs-fMRI, and k-means clustering into one optimization problem. The dictionary is comprised of an over-complete set of time course signals, with which a sparse representation of rs-fMRI signals can be constructed. Cerebellar functional regions were then identified using k-means clustering based on the sparse representation of rs-fMRI signals. We solved SDLC using a multi-block hybrid proximal alternating method that guarantees strong convergence. We evaluated the reliability of SDLC and benchmarked its classification accuracy against other clustering techniques using simulated data. We then demonstrated that SDLC can identify biologically reasonable functional regions of the cerebellum as estimated by their cerebello-cortical functional connectivity. We further provided new insights into the cerebello-cortical functional organisation in children.
Cerebellar Functional Parcellation Using Sparse Dictionary Learning Clustering.
Wang, Changqing; Kipping, Judy; Bao, Chenglong; Ji, Hui; Qiu, Anqi
2016-01-01
The human cerebellum has recently been discovered to contribute to cognition and emotion beyond the planning and execution of movement, suggesting its functional heterogeneity. We aimed to identify the functional parcellation of the cerebellum using information from resting-state functional magnetic resonance imaging (rs-fMRI). For this, we introduced a new data-driven decomposition-based functional parcellation algorithm, called Sparse Dictionary Learning Clustering (SDLC). SDLC integrates dictionary learning, sparse representation of rs-fMRI, and k-means clustering into one optimization problem. The dictionary is comprised of an over-complete set of time course signals, with which a sparse representation of rs-fMRI signals can be constructed. Cerebellar functional regions were then identified using k-means clustering based on the sparse representation of rs-fMRI signals. We solved SDLC using a multi-block hybrid proximal alternating method that guarantees strong convergence. We evaluated the reliability of SDLC and benchmarked its classification accuracy against other clustering techniques using simulated data. We then demonstrated that SDLC can identify biologically reasonable functional regions of the cerebellum as estimated by their cerebello-cortical functional connectivity. We further provided new insights into the cerebello-cortical functional organization in children.
A Novel Dynamic Clustering Algorithm Based on Immune Network and Tabu Search
Institute of Scientific and Technical Information of China (English)
ZHONGJiang; WUZhongfu; WUKaigui; YANGQiang
2005-01-01
It's difficult to indicate the rational number of partitions in the data set before clustering usually.The problem can't be solved by traditional clustering algorithm, such as k-means or its variations. This paper proposes a novel Dynamic clustering algorithm based on the artificial immune network and tabu search (DCBIT). It optimizes the number and the location of the clusters at the same time. The algorithm includes two phases, it begins by running immune network algorithm to find a Clustering feasible solution (CFS), then it employs tabu search to get the optimum cluster number and cluster centers on the CFS. Also, the probabilities acquiring the CFS through immune network algorithm have been discussed in this paper. Some experimental results show that new algorithm has satisfied convergent probability and convergent speed.
Image Transformation using Modified Kmeans clustering algorithm for Parallel saliency map
Directory of Open Access Journals (Sweden)
Aman Sharma
2013-08-01
Full Text Available to design an image transformation system is Depending on the transform chosen, the input and output images may appear entirely different and have different interpretations. Image Transformationwith the help of certain module like input image, image cluster index, object in cluster and color index transformation of image. K-means clustering algorithm is used to cluster the image for bettersegmentation. In the proposed method parallel saliency algorithm with K-means clustering is used to avoid local minima and to find the saliency map. The region behind that of using parallel saliency algorithm is proved to be more than exiting saliency algorithm.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
Directory of Open Access Journals (Sweden)
Mingwei Leng
2013-01-01
Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
Clustering Algorithm Based on Crowding Niche%小生境排挤聚类算法
Institute of Scientific and Technical Information of China (English)
业宁; 董逸生
2003-01-01
A new clustering algorithm is proposed in this paper, which is based on crowding niche. Homogeneityspontaneous to withstands heterogeneity when organisms are evolving. Contemporary, Individual in same class com-pete each other to strive for limited resource. Individual that has bad fitness will be eliminated. We propose a cluster-ing algorithm based on this idea. Experiment evaluation has proved its efficiency.
A Heuristic Task Scheduling Algorithm for Heterogeneous Virtual Clusters
Directory of Open Access Journals (Sweden)
Weiwei Lin
2016-01-01
Full Text Available Cloud computing provides on-demand computing and storage services with high performance and high scalability. However, the rising energy consumption of cloud data centers has become a prominent problem. In this paper, we first introduce an energy-aware framework for task scheduling in virtual clusters. The framework consists of a task resource requirements prediction module, an energy estimate module, and a scheduler with a task buffer. Secondly, based on this framework, we propose a virtual machine power efficiency-aware greedy scheduling algorithm (VPEGS. As a heuristic algorithm, VPEGS estimates task energy by considering factors including task resource demands, VM power efficiency, and server workload before scheduling tasks in a greedy manner. We simulated a heterogeneous VM cluster and conducted experiment to evaluate the effectiveness of VPEGS. Simulation results show that VPEGS effectively reduced total energy consumption by more than 20% without producing large scheduling overheads. With the similar heuristic ideology, it outperformed Min-Min and RASA with respect to energy saving by about 29% and 28%, respectively.
A Request Distribution Algorithm for Web Server Cluster
Directory of Open Access Journals (Sweden)
Wei Zhang
2011-12-01
Full Text Available With the explosively increasing of web-based applications’ workloads, Web server cluster encounters challenge in response time for requests. Request distribution among servers in web server cluster is the key to address such challenge, especially under heavy workloads. In this paper, we propose a new request distribution algorithm named llac (least load active cache for load balancing switch in web server cluster. The goal of llac is to improve the cache hit rate and reduce response time. Packets are parsed in IP level, and back-end servers are notified to cache hot files using link change technology, neither changing URL information nor modifying the service program. This avoids switching overhead between user mode and kernel mode. The load balancing switch directly creates connection with the selected server, avoiding migrating connection overhead. This policy estimates the current composited load of each server and selects the server with the least load to serve the request. It also improves the resource utilization of web servers. Experimental results show that llac achieves better performance for web applications than wrr (weight round robin which is a popular request distribution.
A clustering algorithm for sample data based on environmental pollution characteristics
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Textural defect detect using a revised ant colony clustering algorithm
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section
Institute of Scientific and Technical Information of China (English)
YU Yongqian; ZHAO Xiangguo; CHEN Hengyue; WANG Bin; YU Ge; WANG Guoren
2006-01-01
This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance.
Kim, R S J; Postman, M; Strauss, M A; Bahcall, Neta A; Gunn, J E; Lupton, R H; Annis, J; Nichol, R C; Castander, F J; Brinkmann, J; Brunner, R J; Connolly, A; Csabai, I; Hindsley, R B; Ivezic, Z; Vogeley, M S; York, D G; Kim, Rita S. J.; Kepner, Jeremy V.; Postman, Marc; Strauss, Michael A.; Bahcall, Neta A.; Gunn, James E.; Lupton, Robert H.; Annis, James; Nichol, Robert C.; Castander, Francisco J.; Brunner, Robert J.; Connolly, Andrew; Csabai, Istvan; Hindsley, Robert B.; Ivezic, Zeljko; Vogeley, Michael S.; York, Donald G.
2002-01-01
We present a comparison of three cluster finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg^2 region of Sloan Digital Sky Survey (SDSS) imaging data: the Matched Filter (MF; Postman et al. 1996), the Adaptive Matched Filter (AMF; Kepner et al. 1999) and a color-magnitude filtered Voronoi Tessellation Technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a non-uniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for eac...
An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering
Institute of Scientific and Technical Information of China (English)
Taher NIKNAM; Babak AMIRI; Javad OLAMAEI; Ali AREFI
2009-01-01
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Riplcy's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.
An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
Directory of Open Access Journals (Sweden)
Kang Zhang
2014-01-01
Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.
Combined Density-based and Constraint-based Algorithm for Clustering
Institute of Scientific and Technical Information of China (English)
CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing
2006-01-01
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.
Directory of Open Access Journals (Sweden)
Guohua Zou
2016-12-01
Full Text Available New medical imaging technology, such as Computed Tomography and Magnetic Resonance Imaging (MRI, has been widely used in all aspects of medical diagnosis. The purpose of these imaging techniques is to obtain various qualitative and quantitative data of the patient comprehensively and accurately, and provide correct digital information for diagnosis, treatment planning and evaluation after surgery. MR has a good imaging diagnostic advantage for brain diseases. However, as the requirements of the brain image definition and quantitative analysis are always increasing, it is necessary to have better segmentation of MR brain images. The FCM (Fuzzy C-means algorithm is widely applied in image segmentation, but it has some shortcomings, such as long computation time and poor anti-noise capability. In this paper, firstly, the Ant Colony algorithm is used to determine the cluster centers and the number of FCM algorithm so as to improve its running speed. Then an improved Markov random field model is used to improve the algorithm, so that its antinoise ability can be improved. Experimental results show that the algorithm put forward in this paper has obvious advantages in image segmentation speed and segmentation effect.
The IR Luminosity Functions of Rich Clusters
Bai, Lei; Rieke, Marcia J; Christlein, Daniel; Zabludoff, Ann I
2008-01-01
We present MIPS observations of the cluster A3266. About 100 spectroscopic cluster members have been detected at 24 micron. The IR luminosity function in A3266 is very similar to that in the Coma cluster down to the detection limit L_IR~10^43 ergs/s, suggesting a universal form of the bright end IR LF for local rich clusters with M~10^15 M_sun. The shape of the bright end of the A3266-Coma composite IR LF is not significantly different from that of nearby field galaxies, but the fraction of IR-bright galaxies (SFR > 0.2M_sun/yr) in both clusters increases with cluster-centric radius. The decrease of the blue galaxy fraction toward the high density cores only accounts for part of the trend; the fraction of red galaxies with moderate SFRs (0.2 < SFR < 1 M_sun/yr) also decreases with increasing galaxy density. These results suggest that for the IR bright galaxies, nearby rich clusters are distinguished from the field by a lower star-forming galaxy fraction, but not by a change in L*_IR. The composite IR LF...
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Energy Efficient Backoff Hierarchical Clustering Algorithms for Multi-Hop Wireless Sensor Networks
Institute of Scientific and Technical Information of China (English)
Jun Wang; Yong-Tao Cao; Jun-Yuan Xie; Shi-Fu Chen
2011-01-01
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
Extension of K-Means Algorithm for clustering mixed data | Onuodu ...
African Journals Online (AJOL)
Extension of K-Means Algorithm for clustering mixed data. ... PROMOTING ACCESS TO AFRICAN RESEARCH ... In this work, a new hybrid method has been proposed which extends K-means algorithm to categorical domain and mixed-type ...
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
DEFF Research Database (Denmark)
Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte
2006-01-01
Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset....... The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering...
Cardiac mitochondria exhibit dynamic functional clustering
Directory of Open Access Journals (Sweden)
Felix Tobias Kurz
2014-09-01
Full Text Available Multi-oscillatory behavior of mitochondrial inner membrane potential ΔΨm in self-organized cardiac mitochondrial networks can be triggered by metabolic or oxidative stress. Spatio-temporal analyses of cardiac mitochondrial networks have shown that mitochondria are heterogeneously organized in synchronously oscillating clusters in which the mean cluster frequency and size are inversely correlated, thus suggesting a modulation of cluster frequency through local inter-mitochondrial coupling. In this study, we propose a method to examine the mitochondrial network's topology through quantification of its dynamic local clustering coefficients. Individual mitochondrial ΔΨm oscillation signals were identified for each cardiac myocyte and cross-correlated with all network mitochondria using previously described methods (Kurz et al., 2010. Time-varying inter-mitochondrial connectivity, defined for mitochondria in the whole network whose signals are at least 90% correlated at any given time point, allowed considering functional local clustering coefficients. It is shown that mitochondrial clustering in isolated cardiac myocytes changes dynamically and is significantly higher than for random mitochondrial networks that are constructed using the Erdös-Rényi model based on the same sets of vertices. The network's time-averaged clustering coefficient for cardiac myocytes was found to be 0.500 ± 0.051 (N=9 versus 0.061 ± 0.020 for random networks, respectively. Our results demonstrate that cardiac mitochondria constitute a network with dynamically connected constituents whose topological organization is prone to clustering. Cluster partitioning in networks of coupled oscillators has been observed in scale-free and chaotic systems and is therefore in good agreement with previous models of cardiac mitochondrial networks (Aon et al., 2008.
MST-BASED CLUSTERING TOPOLOGY CONTROL ALGORITHM FOR WIRELESS SENSOR NETWORKS
Institute of Scientific and Technical Information of China (English)
Cai Wenyu; Zhang Meiyan
2010-01-01
In this paper,we propose a novel clustering topology control algorithm named Minimum Spanning Tree (MST)-based Clustering Topology Control (MCTC) for Wireless Sensor Networks (WSNs),which uses a hybrid approach to adjust sensor nodes' transmission power in two-tiered hierarchical WSNs. MCTC algorithm employs a one-hop Maximum Energy & Minimum Distance (MEMD) clustering algorithm to decide clustering status. Each cluster exchanges information between its own Cluster Members (CMs) locally and then deliveries information to the Cluster Head (CH). Moreover,CHs exchange information between CH and CH and afterwards transmits aggregated information to the base station finally. The intra-cluster topology control scheme uses MST to decide CMs' transmission radius,similarly,the inter-cluster topology control scheme applies MST to decide CHs' transmission radius. Since the intra-cluster topology control is a full distributed approach and the inter-cluster topology control is a pure centralized approach performed by the base station,therefore,MCTC algorithm belongs to one kind of hybrid clustering topology control algorithms and can obtain scalability topology and strong connectivity guarantees simultaneously. As a result,the network topology will be reduced by MCTC algorithm so that network energy efficiency will be improved. The simulation results verify that MCTC outperforms traditional topology control schemes such as LMST,DRNG and MEMD at the aspects of average node's degree,average node's power radius and network lifetime,respectively.
Identifying prototypical components in behaviour using clustering algorithms.
Directory of Open Access Journals (Sweden)
Elke Braun
Full Text Available Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts.
Quadratic Interpolation Algorithm for Minimizing Tabulated Function
Directory of Open Access Journals (Sweden)
E. A. Youness
2008-01-01
Full Text Available Problem statement: The problem of finding the minimum value of objective function, when we know only some values of it, is needed in more practical fields. Quadratic interpolation algorithms are the famous tools deal with this kind of these problems. These algorithms interested with the polynomial space in which the objective function is approximated. Approach: In this study we approximated the objective function by a one dimensional quadratic polynomial. This approach saved the time and the effort to get the best point at which the objective is minimized. Results: The quadratic polynomial in each one of the steps of the proposed algorithm, accelerate the convergent to the best value of the objective function without taking into account all points of the interpolation set. Conclusion: Any n-dimensional problem of finding a minimal value of a function, given by some values, can be converted to one dimensional problem easier in deal.
Parallel Genetic Algorithms with Dynamic Topology using Cluster Computing
Directory of Open Access Journals (Sweden)
ADAR, N.
2016-08-01
Full Text Available A parallel genetic algorithm (PGA conducts a distributed meta-heuristic search by employing genetic algorithms on more than one subpopulation simultaneously. PGAs migrate a number of individuals between subpopulations over generations. The layout that facilitates the interactions of the subpopulations is called the topology. Static migration topologies have been widely incorporated into PGAs. In this article, a PGA with a dynamic migration topology (D-PGA is proposed. D-PGA generates a new migration topology in every epoch based on the average fitness values of the subpopulations. The D-PGA has been tested against ring and fully connected migration topologies in a Beowulf Cluster. The D-PGA has outperformed the ring migration topology with comparable communication cost and has provided competitive or better results than a fully connected migration topology with significantly lower communication cost. PGA convergence behaviors have been analyzed in terms of the diversities within and between subpopulations. Conventional diversity can be considered as the diversity within a subpopulation. A new concept of permeability has been introduced to measure the diversity between subpopulations. It is shown that the success of the proposed D-PGA can be attributed to maintaining a high level of permeability while preserving diversity within subpopulations.
A Heuristic Clustering Algorithm for Mining Communities in Signed Networks
Institute of Scientific and Technical Information of China (English)
Bo Yang; Da-You Liu
2007-01-01
Signed network is an important kind of complex network, which includes both positive relations and negative relations. Communities of a signed network are defined as the groups of vertices, within which positive relations are dense and between which negative relations are also dense. Being able to identify communities of signed networks is helpful for analysis of such networks. Hitherto many algorithms for detecting network communities have been developed. However, most of them are designed exclusively for the networks including only positive relations and are not suitable for signed networks.So the problem of mining communities of signed networks quickly and correctly has not been solved satisfactorily. In this paper, we propose a heuristic algorithm to address this issue. Compared with major existing methods, our approach has three distinct features. First, it is very fast with a roughly linear time with respect to network size. Second, it exhibits a good clustering capability and especially can work well with complex networks without well-defined community structures.Finally, it is insensitive to its built-in parameters and requires no prior knowledge.
IMPROVING THE CLUSTER PERFORMANCE BY COMBINING PSO AND K-MEANS ALGORITHM
Directory of Open Access Journals (Sweden)
G. Komarasamy
2011-04-01
Full Text Available Clustering is a technique that can divide data objects into groups based on information found in the data that describes the objects and their relationships. In this paper describe to improving the clustering performance by combine Particle Swarm Optimization (PSO and K-means algorithm. The PSO algorithm successfully converges during the initial stages of a global search, but around global optimum, the search process will become very slow. On the contrary, K-means algorithm can achieve faster convergence to optimum solution. Unlike K-means method, new algorithm does not require a specific number of clusters given before performing the clustering process and it is able to find the local optimal number of clusters during the clustering process. In each iteration process, the inertia weight was changed based on the current iteration and best fitness. The experimental result shows that better performance of new algorithm by using different data sets.
A new-style clustering algorithm based on swarm intelligent theory
Institute of Scientific and Technical Information of China (English)
CHEN Zhuo; LIU Xiang-shuang
2007-01-01
Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this paper, a FSM and a mathematic model of a new-style clustering algorithm based on the swarm intelligence are provided. In this algorithm, the clustering main body moves in a three-dimensional space and has the abilities of memory, communication, analysis, judgment and coordinating information. Experimental results conform that this algorithm has many merits such as insensitive to the order of the data, capable of dealing with exceptional,high-dimension or complicated data. The algorithm can be used in the fields of Web mining, incremental clustering, economic analysis, pattern recognition, document classification and so on.
Function Optimization Based on Quantum Genetic Algorithm
Directory of Open Access Journals (Sweden)
Ying Sun
2014-01-01
Full Text Available Optimization method is important in engineering design and application. Quantum genetic algorithm has the characteristics of good population diversity, rapid convergence and good global search capability and so on. It combines quantum algorithm with genetic algorithm. A novel quantum genetic algorithm is proposed, which is called Variable-boundary-coded Quantum Genetic Algorithm (vbQGA in which qubit chromosomes are collapsed into variable-boundary-coded chromosomes instead of binary-coded chromosomes. Therefore much shorter chromosome strings can be gained. The method of encoding and decoding of chromosome is first described before a new adaptive selection scheme for angle parameters used for rotation gate is put forward based on the core ideas and principles of quantum computation. Eight typical functions are selected to optimize to evaluate the effectiveness and performance of vbQGA against standard Genetic Algorithm (sGA and Genetic Quantum Algorithm (GQA. The simulation results show that vbQGA is significantly superior to sGA in all aspects and outperforms GQA in robustness and solving velocity, especially for multidimensional and complicated functions.
Directory of Open Access Journals (Sweden)
Noha Negm
2013-06-01
Full Text Available Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT. It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
HYBRID APPROACH FOR OPTIMAL CLUSTER HEAD SELECTION IN WSN USING LEACH AND MONKEY SEARCH ALGORITHMS
Directory of Open Access Journals (Sweden)
T. SHANKAR
2017-02-01
Full Text Available Wireless Sensor Networks (WSNs are being widely used with low-cost, lowpower, multifunction sensors based on the development of wireless communication, which has enabled a wide variety of new applications. In WSN, the main concern is that it contains a limited power battery and is constrained in energy consumption hence energy and lifetime are of paramount importance. To achieve high energy efficiency and prolong network lifetime in WSNs, clustering techniques have been widely adopted. The proposed algorithm is hybridization of well-known Low-Energy Adaptive Clustering Hierarchy (LEACH algorithm with a distinctive Monkey Search (MS algorithm, which is an optimization algorithm used for optimal cluster head selection. The proposed hybrid algorithm exhibit high throughput, residual energy and improved lifetime. Comparison of the proposed hybrid algorithm is made with the well-known cluster-based protocols for WSNs, namely, LEACH and monkey search algorithm, individually.
Select and Cluster: A Method for Finding Functional Networks of Clustered Voxels in fMRI
DonGiovanni, Danilo
2016-01-01
Extracting functional connectivity patterns among cortical regions in fMRI datasets is a challenge stimulating the development of effective data-driven or model based techniques. Here, we present a novel data-driven method for the extraction of significantly connected functional ROIs directly from the preprocessed fMRI data without relying on a priori knowledge of the expected activations. This method finds spatially compact groups of voxels which show a homogeneous pattern of significant connectivity with other regions in the brain. The method, called Select and Cluster (S&C), consists of two steps: first, a dimensionality reduction step based on a blind multiresolution pairwise correlation by which the subset of all cortical voxels with significant mutual correlation is selected and the second step in which the selected voxels are grouped into spatially compact and functionally homogeneous ROIs by means of a Support Vector Clustering (SVC) algorithm. The S&C method is described in detail. Its performance assessed on simulated and experimental fMRI data is compared to other methods commonly used in functional connectivity analyses, such as Independent Component Analysis (ICA) or clustering. S&C method simplifies the extraction of functional networks in fMRI by identifying automatically spatially compact groups of voxels (ROIs) involved in whole brain scale activation networks. PMID:27656202
Ortiz, Juan F; Rokas, Antonis
2017-01-01
Closely spaced clusters of tandemly duplicated genes (CTDGs) contribute to the diversity of many phenotypes, including chemosensation, snake venom, and animal body plans. CTDGs have traditionally been identified subjectively as genomic neighborhoods containing several gene duplicates in close proximity; however, CTDGs are often highly variable with respect to gene number, intergenic distance, and synteny. This lack of formal definition hampers the study of CTDG evolutionary dynamics and the discovery of novel CTDGs in the exponentially growing body of genomic data. To address this gap, we developed a novel homology-based algorithm, CTDGFinder, which formalizes and automates the identification of CTDGs by examining the physical distribution of individual members of families of duplicated genes across chromosomes. Application of CTDGFinder accurately identified CTDGs for many well-known gene clusters (e.g., Hox and beta-globin gene clusters) in the human, mouse and 20 other mammalian genomes. Differences between previously annotated gene clusters and our inferred CTDGs were due to the exclusion of nonhomologs that have historically been considered parts of specific gene clusters, the inclusion or absence of genes between the CTDGs and their corresponding gene clusters, and the splitting of certain gene clusters into distinct CTDGs. Examination of human genes showing tissue-specific enhancement of their expression by CTDGFinder identified members of several well-known gene clusters (e.g., cytochrome P450s and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating CTDG identification, CTDGFinder will facilitate understanding of CTDG evolutionary dynamics, their functional implications, and how they are associated with phenotypic diversity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e
A Novel Distributed Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Sahar Adabi
2008-01-01
Full Text Available This paper proposed a new Distributed Score Based Clustering Algorithm (DSBCA for Mobile Ad-hoc Networks (MANETs.In MANETs, select suitable nodes in clusters as cluster heads are so important. The proposed Clustering Algorithm considers the Battery Remaining, Number of Neighbors, Number of Members, and Stability in order to calculate the node's score with a linear algorithm. After each node calculates its score independently, the neighbors of the node must be notified about it. Also each node selects one of its neighbors with the highest score to be its cluster head and, therefore the selection of cluster heads is performed in a distributed manner with most recent information about current status of neighbor nodes. The proposed algorithm was compared with Weighted Clustering Algorithm and Distributed Weighted Clustering Algorithm in terms of number of clusters, number of re-affiliations, lifespan of nodes in the system, end-to-end throughput and overhead. The simulation results proved that the proposed algorithm has achieved the goals.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
A Cluster Algorithm for the 2-D SU(3) × SU(3) Chiral Model
Ji, Da-ren; Zhang, Jian-bo
1996-07-01
To extend the cluster algorithm to SU(N) × SU(N) chiral models, a variant version of Wolff's cluster algorithm is proposed and tested for the 2-dimensional SU(3) × SU(3) chiral model. The results show that the new method can reduce the critical slowing down in SU(3) × SU(3) chiral model.
Clustering of protein domains for functional and evolutionary studies
Directory of Open Access Journals (Sweden)
Long Paul F
2009-10-01
Full Text Available Abstract Background The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. Results An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM, which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. Conclusion The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
Lowest-ID with Adaptive ID Reassignment: A Novel Mobile Ad-Hoc Networks Clustering Algorithm
Gavalas, Damianos; Konstantopoulos, Charalampos; Mamalis, Basilis
2011-01-01
Clustering is a promising approach for building hierarchies and simplifying the routing process in mobile ad-hoc network environments. The main objective of clustering is to identify suitable node representatives, i.e. cluster heads (CHs), to store routing and topology information and maximize clusters stability. Traditional clustering algorithms suggest CH election exclusively based on node IDs or location information and involve frequent broadcasting of control packets, even when network topology remains unchanged. More recent works take into account additional metrics (such as energy and mobility) and optimize initial clustering. However, in many situations (e.g. in relatively static topologies) re-clustering procedure is hardly ever invoked; hence initially elected CHs soon reach battery exhaustion. Herein, we introduce an efficient distributed clustering algorithm that uses both mobility and energy metrics to provide stable cluster formations. CHs are initially elected based on the time and cost-efficien...
Combinatorial Clustering Algorithm of Quantum-Behaved Particle Swarm Optimization and Cloud Model
Directory of Open Access Journals (Sweden)
Mi-Yuan Shan
2013-01-01
Full Text Available We propose a combinatorial clustering algorithm of cloud model and quantum-behaved particle swarm optimization (COCQPSO to solve the stochastic problem. The algorithm employs a novel probability model as well as a permutation-based local search method. We are setting the parameters of COCQPSO based on the design of experiment. In the comprehensive computational study, we scrutinize the performance of COCQPSO on a set of widely used benchmark instances. By benchmarking combinatorial clustering algorithm with state-of-the-art algorithms, we can show that its performance compares very favorably. The fuzzy combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization (FCOCQPSO in vague sets (IVSs is more expressive than the other fuzzy sets. Finally, numerical examples show the clustering effectiveness of COCQPSO and FCOCQPSO clustering algorithms which are extremely remarkable.
A Heuristic Clustering Algorithm for Intrusion Detection Based on Information Entropy
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper studied on the clustering problem for intrusion detection with the theory of information entropy, it was put forward that the clustering problem for exact intrusion detection based on information entropy is NP-complete, therefore, the heuristic algorithm to solve the clustering problem for intrusion detection was designed, this algorithm has the characteristic of incremental development, it can deal with the database with large connection records from the internet.
A Cluster Maintenance Algorithm Based on Relative Mobility for Mobile Ad Hoc Network Management
Institute of Scientific and Technical Information of China (English)
SHENZhong; CHANGYilin; ZHANGXin
2005-01-01
The dynamic topology of mobile ad hoc networks makes network management significantly more challenging than wireline networks. The traditional Client/Server (Manager/Agent) management paradigm could not work well in such a dynamic environment, while the hierarchical network management architecture based on clustering is more feasible. Although the movement of nodes makes the cluster structure changeable and introduces new challenges for network management, the mobility is a relative concept. A node with high relative mobility is more prone to unstable behavior than a node with less relative mobility, thus the relative mobility of a node can be used to predict future node behavior. This paper presents the cluster availability which provides a quantitative measurement of cluster stability. Furthermore, a cluster maintenance algorithm based on cluster availability is proposed. The simulation results show that, compared to the Minimum ID clustering algorithm, our algorithm successfully alleviates the influence caused by node mobility and make the network management more efficient.
Andryani, Diyah Septi; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Clustering aims to classify the different patterns into groups called clusters. In this clustering method, we use n-mers frequency to calculate the distance matrix which is considered more accurate than using the DNA alignment. The clustering results could be used to discover biologically important sub-sections and groups of genes. Many clustering methods have been developed, while hard clustering methods considered less accurate than fuzzy clustering methods, especially if it is used for outliers data. Among fuzzy clustering methods, fuzzy c-means is one the best known for its accuracy and simplicity. Fuzzy c-means clustering uses membership function variable, which refers to how likely the data could be members into a cluster. Fuzzy c-means clustering works using the principle of minimizing the objective function. Parameters of membership function in fuzzy are used as a weighting factor which is also called the fuzzier. In this study we implement hybrid clustering using fuzzy c-means and divisive algorithm which could improve the accuracy of cluster membership compare to traditional partitional approach only. In this study fuzzy c-means is used in the first step to find partition results. Furthermore divisive algorithms will run on the second step to find sub-clusters and dendogram of phylogenetic tree. To find the best number of clusters is determined using the minimum value of Davies Bouldin Index (DBI) of the cluster results. In this research, the results show that the methods introduced in this paper is better than other partitioning methods. Finally, we found 3 clusters with DBI value of 1.126628 at first step of clustering. Moreover, DBI values after implementing the second step of clustering are always producing smaller IDB values compare to the results of using first step clustering only. This condition indicates that the hybrid approach in this study produce better performance of the cluster results, in term its DBI values.
Parallelization of the Wolff single-cluster algorithm
Kaupužs, J.; Rimšāns, J.; Melnik, R. V. N.
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024 , we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n≥2 . Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
Comparison and evaluation of network clustering algorithms applied to genetic interaction networks.
Hou, Lin; Wang, Lin; Berg, Arthur; Qian, Minping; Zhu, Yunping; Li, Fangting; Deng, Minghua
2012-01-01
The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes; Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets; the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks.
PERFORMANCE OF K-MEANS CLUSTERING AND BIRD FLOCKING ALGORITHM FOR GROUPING THE WEB LOG FILES
Directory of Open Access Journals (Sweden)
R. SUGUNA
2012-10-01
Full Text Available Data mining is the process of analyzing the interesting pattern and knowledge in different perspectives and summarizing it into useful information from the large amount of data. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. The unlabled vast amount of data can be grouped using clustering or classification algorithms. Cluster analysis or clustering is the task of assigning a set of objects into groups called clusters. So, the objects in the same cluster are more similar to each other than to those in other clusters. Many of the researchers evaluated the performance of thefamiliar K-means clustering algorithm and attempt to improve the efficiency of the algorithm. This paper will analyze the performance of the K-means clustering algorithm with the biological based algorithm called Bird flocking algorithm for grouping the web logs. Web logs are unformatted text files which contains the information regarding the user’s browser detail. The proposed system takes the input as web log files and groups the web sites based on the interesting rate of the users. The performance is evaluated in terms of no of clusters, CPU utilization time and accuracy.
CLUSTAG & WCLUSTAG: Hierarchical Clustering Algorithms for Efficient Tag-SNP Selection
Ao, Sio-Iong
More than 6 million single nucleotide polymorphisms (SNPs) in the human genome have been genotyped by the HapMap project. Although only a pro portion of these SNPs are functional, all can be considered as candidate markers for indirect association studies to detect disease-related genetic variants. The complete screening of a gene or a chromosomal region is nevertheless an expensive undertak ing for association studies. A key strategy for improving the efficiency of association studies is to select a subset of informative SNPs, called tag SNPs, for analysis. In the chapter, hierarchical clustering algorithms have been proposed for efficient tag SNP selection.
Improved FIFO Scheduling Algorithm Based on Fuzzy Clustering in Cloud Computing
Directory of Open Access Journals (Sweden)
Jian Li
2017-02-01
Full Text Available In cloud computing, some large tasks may occupy too many resources and some small tasks may wait for a long time based on First-In-First-Out (FIFO scheduling algorithm. To reduce tasks’ waiting time, we propose a task scheduling algorithm based on fuzzy clustering algorithms. We construct a task model, resource model, and analyze tasks’ preference, then classify resources with fuzzy clustering algorithms. Based on the parameters of cloud tasks, the algorithm will calculate resource expectation and assign tasks to different resource clusters, so the complexity of resource selection will be decreased. As a result, the algorithm will reduce tasks’ waiting time and improve the resource utilization. The experiment results show that the proposed algorithm shortens the execution time of tasks and increases the resource utilization.
Bitter, Ingmar; Brown, John E.; Brickman, Daniel; Summers, Ronald M.
2004-04-01
The presented method significantly reduces the time necessary to validate a computed tomographic colonography (CTC) computer aided detection (CAD) algorithm of colonic polyps applied to a large patient database. As the algorithm is being developed on Windows PCs and our target, a Beowulf cluster, is running on Linux PCs, we made the application dual platform compatible using a single source code tree. To maintain, share, and deploy source code, we used CVS (concurrent versions system) software. We built the libraries from their sources for each operating system. Next, we made the CTC CAD algorithm dual-platform compatible and validate that both Windows and Linux produced the same results. Eliminating system dependencies was mostly achieved using the Qt programming library, which encapsulates most of the system dependent functionality in order to present the same interface on either platform. Finally, we wrote scripts to execute the CTC CAD algorithm in parallel. Running hundreds of simultaneous copies of the CTC CAD algorithm on a Beowulf cluster computing network enables execution in less than four hours on our entire collection of over 2400 CT scans, as compared to a month a single PC. As a consequence, our complete patient database can be processed daily, boosting research productivity. Large scale validation of a computer aided polyp detection algorithm for CT colonography using cluster computing significantly improves the round trip time of algorithm improvement and revalidation.
Karjee, Jyotirmoy
2011-01-01
Objective: The main objective of this paper is to construct a distributed clustering algorithm based upon spatial data correlation among sensor nodes and perform data accuracy for each distributed cluster at their respective cluster head node. Design Procedure/Approach: We investigate that due to deployment of high density of sensor nodes in the sensor field, spatial data are highly correlated among sensor nodes in spatial domain. Based on high data correlation among sensor nodes, we propose a non -overlapping irregular distributed clustering algorithm with different sizes to collect most accurate or precise data at the cluster head node for each respective distributed cluster. To collect the most accurate data at the cluster head node for each distributed cluster in sensor field, we propose a Data accuracy model and compare the results with Information accuracy model. Finding: Simulation results shows that our propose Data accuracy model collects more accurate data and gives better performance than Informati...
Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm
Mitra, Sunanda; Pemmaraju, Surya
1992-01-01
Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.
Institute of Scientific and Technical Information of China (English)
CHEN Yunkai; LU Zhengding; LI Ruixuan; LI Yuhua; SUN Xiaolin
2006-01-01
Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Directory of Open Access Journals (Sweden)
Liping Sun
2014-01-01
Full Text Available An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Implementation of Clustering Algorithms for real datasets in Medical Diagnostics using MATLAB
Directory of Open Access Journals (Sweden)
B. Venkataramana
2017-03-01
Full Text Available As in the medical field, for one disease there require samples given by diagnosis. The samples will be analyzed by a doctor or a pharmacist. As the no. of patients increases their samples also increases, there require more time to analyze samples for deciding the stage of the disease. To analyze the sample every time requires a skilled person. The samples can be classified by applying them to clustering algorithms. Data clustering has been considered as the most important raw data analysis method used in data mining technology. Most of the clustering techniques proved their efficiency in many applications such as decision making systems, medical sciences, earth sciences etc. Partition based clustering is one of the main approach in clustering. There are various algorithms of data clustering, every algorithm has its own advantages and disadvantages. This work reports the results of classification performance of three such widely used algorithms namely K-means (KM, Fuzzy c-means and Fuzzy Possibilistic c-Means (FPCM clustering algorithms. To analyze these algorithms three known data sets from UCI machine learning repository are taken such as thyroid data, liver and wine. The efficiency of clustering output is compared with the classification performance, percentage of correctness. The experimental results show that K-means and FCM give same performance for liver data. And FCM and FPCM are giving same performance for thyroid and wine data. FPCM has more efficient classification performance in all the given data sets.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
Frisca, Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Genetic algorithm based two-mode clustering of metabolomics data
Hageman, J.A.; Berg, R.A. van den; Westerhuis, J.A.; Werf, M.J. van der; Smilde, A.K.
2008-01-01
Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering metho
Decomposition of Data Mining Algorithms into Unified Functional Blocks
Directory of Open Access Journals (Sweden)
Ivan Kholod
2016-01-01
Full Text Available The present paper describes the method of creating data mining algorithms from unified functional blocks. This method splits algorithms into independently functioning blocks. These blocks must have unified interfaces and implement pure functions. The method allows us to create new data mining algorithms from existing blocks and improves the existing algorithms by optimizing single blocks or the whole structure of the algorithms. This becomes possible due to a number of important properties inherent in pure functions and hence functional blocks.
A highly efficient multi-core algorithm for clustering extremely large datasets
Directory of Open Access Journals (Sweden)
Kraus Johann M
2010-04-01
Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.
Directory of Open Access Journals (Sweden)
Simon Fong
2014-01-01
Full Text Available Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
A New Cooperative Algorithm Based on PSO and K-Means for Data Clustering
Directory of Open Access Journals (Sweden)
Mehdi Sargolzaei
2012-01-01
Full Text Available Problem statement: Data clustering has been applied in multiple fields such as machine learning, data mining, wireless sensor networks and pattern recognition. One of the most famous clustering approaches is K-means which effectively has been used in many clustering problems, but this algorithm has some drawbacks such as local optimal convergence and sensitivity to initial points. Approach: Particle Swarm Optimization (PSO algorithm is one of the swarm intelligence algorithms, which is applied in determining the optimal cluster centers. In this study, a cooperative algorithm based on PSO and k-means is presented. Result: The proposed algorithm utilizes both global search ability of PSO and local search ability of k-means. The proposed algorithm and also PSO, PSO with Contraction Factor (CF-PSO, k-means algorithms and KPSO hybrid algorithm have been used for clustering six datasets and their efficiencies are compared with each other. Conclusion: Experimental results show that the proposed algorithm has an acceptable efficiency and robustness.
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
Institute of Scientific and Technical Information of China (English)
CHUShuchuan; JohnF.Roddick
2003-01-01
In this paper, a cluster generation algorithm for vector quantization using a tabu search approach with simulated annealing is proposed. The main iclea of this algorithm is to use the tabu search approach to gen-erate non-local moves for the clusters and apply the sim-ulated annealing technique to select the current best solu-tion, thus improving the cluster generation and reducing the mean squared error. Preliminary experimental results demonstrate that the proposed approach is superior to the tabu search approach with Generalised Lloyd algorithm.
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Clustering, in data mining, is a useful technique for discoveringinte resting data distributions and patterns in the underlying data, and has many app lication fields, such as statistical data analysis, pattern recognition, image p rocessing, and etc. We combine sampling technique with DBSCAN alg orithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSC A N) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental resul ts demonstrate that our algorithms are effective and efficient in clustering lar ge-scale spatial databases.
A Clustering-Based Automatic Transfer Function Design for Volume Visualization
Directory of Open Access Journals (Sweden)
Tianjin Zhang
2016-01-01
Full Text Available The two-dimensional transfer functions (TFs designed based on intensity-gradient magnitude (IGM histogram are effective tools for the visualization and exploration of 3D volume data. However, traditional design methods usually depend on multiple times of trial-and-error. We propose a novel method for the automatic generation of transfer functions by performing the affinity propagation (AP clustering algorithm on the IGM histogram. Compared with previous clustering algorithms that were employed in volume visualization, the AP clustering algorithm has much faster convergence speed and can achieve more accurate clustering results. In order to obtain meaningful clustering results, we introduce two similarity measurements: IGM similarity and spatial similarity. These two similarity measurements can effectively bring the voxels of the same tissue together and differentiate the voxels of different tissues so that the generated TFs can assign different optical properties to different tissues. Before performing the clustering algorithm on the IGM histogram, we propose to remove noisy voxels based on the spatial information of voxels. Our method does not require users to input the number of clusters, and the classification and visualization process is automatic and efficient. Experiments on various datasets demonstrate the effectiveness of the proposed method.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Clustering dynamic textures with the hierarchical em algorithm for modeling video.
Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B
2013-07-01
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing
An Improved K-Means Clustering Algorithm%一种改进的K-Means算法
Institute of Scientific and Technical Information of China (English)
尹成祥; 张宏军; 张睿; 綦秀利; 王彬
2014-01-01
针对典型K-Means算法随机选取初始中心点导致的算法迭代次数过多的问题,采取数据分段方法,将数据点根据距离分成k段,在每段内选取一个中心作为初始中心点,进行迭代运算；为寻找最优的聚类数目k,定义了新的聚类有效性函数-聚类指数,包含聚类紧密度和聚类显著度两个指标,通过最优化聚类指数,在[1, n ]内寻找最优的k值。在IRIS数据集进行的仿真实验结果表明,算法的迭代次数明显减少,寻找的最优k值接近数据集的真实情况,算法有效性得到了验证。%Aiming at the problemsof too much iterative times in selecting initial centroids stochastically for K-Means algorithm,a method is proposed to optimize the initial centroids through cutting the set into k segmentations and select one point in each segmentation as initial centroids for iterative computing. A new valid function called clustering-index is defined as the sum of clustering-density and clustering-significance and can be used to search the optimization of k in the internal of [1, n ]. The simulation experiment with IRIS data set shows that the proposed algorithm converges faster and the value k found is close to the actual value,which proves the validity of the al-gorithm.
Constructing a graph of connections in clustering algorithm of complex objects
Directory of Open Access Journals (Sweden)
Татьяна Шатовская
2015-05-01
Full Text Available The article describes the results of modifying the algorithm Chameleon. Hierarchical multi-level algorithm consists of several phases: the construction of the count, coarsening, the separation and recovery. Each phase can be used various approaches and algorithms. The main aim of the work is to study the quality of the clustering of different sets of data using a set of algorithms combinations at different stages of the algorithm and improve the stage of construction by the optimization algorithm of k choice in the graph construction of k of nearest neighbors
A scalable and practical one-pass clustering algorithm for recommender system
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Two Parallel Swendsen-Wang Cluster Algorithms Using Message-Passing Paradigm
Lin, Shizeng
2008-01-01
In this article, we present two different parallel Swendsen-Wang Cluster(SWC) algorithms using message-passing interface(MPI). One is based on Master-Slave Parallel Model(MSPM) and the other is based on Data-Parallel Model(DPM). A speedup of 24 with 40 processors and 16 with 37 processors is achieved with the DPM and MSPM respectively. The speedup of both algorithms at different temperature and system size is carefully examined both experimentally and theoretically, and a comparison of their efficiency is made. In the last section, based on these two parallel SWC algorithms, two parallel probability changing cluster(PCC) algorithms are proposed.
Infrared Luminosity Function of the Coma Cluster
Bai, L; Rieke, M J; Hinz, J L; Kelly, D M; Blaylock, M; Bai, Lei; Rieke, George H.; Rieke, Marcia J.; Hinz, Joannah L.; Kelly, Douglas M.; Blaylock, Myra
2006-01-01
Using mid-IR and optical data, we deduce the total infrared (IR) luminosities of galaxies in the Coma cluster and present their infrared luminosity function (LF). The shape of the overall Coma IR LF does not show significant differences from the IR LFs of the general field, which indicates the general independence of global galaxy star formation on environment up to densities $\\sim$ 40 times greater than in the field (we cannot test such independence above $L_{ir} \\approx 10^{44} {\\rm ergs s}^{-1}$). However, a shallower faint end slope and a smaller $L_{ir}^{*}$ are found in the core region (where the densities are still higher) compared to the outskirt region of the cluster, and most of the brightest IR galaxies are found outside of the core region. The IR LF in the NGC 4839 group region does not show any unique characteristics. By integrating the IR LF, we find a total star formation rate in the cluster of about 97.0 $M_{\\sun}{\\rm yr}^{-1}$. We also studied the contributions of early- and late-type galaxie...
Directory of Open Access Journals (Sweden)
Bohui Zhu
2013-01-01
Full Text Available This paper presents a novel maximum margin clustering method with immune evolution (IEMMC for automatic diagnosis of electrocardiogram (ECG arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias.
A fast SVM training algorithm based on the set segmentation and k-means clustering
Institute of Scientific and Technical Information of China (English)
YANG Xiaowei; LIN Daying; HAO Zhifeng; LIANG Yanchun; LIU Guirong; HAN Xu
2003-01-01
At present, studies on training algorithms for support vector machines (SVM) are important issues in the field of machine learning. It is a challenging task to improve the efficiency of the algorithm without reducing the generalization performance of SVM. To face this challenge, a new SVM training algorithm based on the set segmentation and k-means clustering is presented in this paper. The new idea is to divide all the original training data into many subsets, followed by clustering each subset using k-means clustering and finally train SVM using the new data set obtained from clustering centroids. Considering that the decomposition algorithm such as SVMlight is one of the major methods for solving support vector machines, the SVMlight is used in our experiments. Simulations on different types of problems show that the proposed method can solve efficiently not only large linear classification problems but also large nonlinear ones.
A Load Balancing Algorithm Based on Maximum Entropy Methods in Homogeneous Clusters
Directory of Open Access Journals (Sweden)
Long Chen
2014-10-01
Full Text Available In order to solve the problems of ill-balanced task allocation, long response time, low throughput rate and poor performance when the cluster system is assigning tasks, we introduce the concept of entropy in thermodynamics into load balancing algorithms. This paper proposes a new load balancing algorithm for homogeneous clusters based on the Maximum Entropy Method (MEM. By calculating the entropy of the system and using the maximum entropy principle to ensure that each scheduling and migration is performed following the increasing tendency of the entropy, the system can achieve the load balancing status as soon as possible, shorten the task execution time and enable high performance. The result of simulation experiments show that this algorithm is more advanced when it comes to the time and extent of the load balance of the homogeneous cluster system compared with traditional algorithms. It also provides novel thoughts of solutions for the load balancing problem of the homogeneous cluster system.
Solving the Capacitated Vehicle Routing Problem Based on Improved Ant-clustering Algorithm
Directory of Open Access Journals (Sweden)
Zhang Jiashan
2015-01-01
Full Text Available The capacitated vehicle routing problems (CVRP are NP-hard. Most approaches can solve small-scale case studies to optimality. Furthermore, they are time-consuming. To overcome the limitation, this paper presents a novel three-phase heuristic approach for the capacitated vehicle routing problem. The first phase aims to identify sets of cost-effective feasible clusters through an improved ant-clustering algorithm, in which the adaptive strategy is adopted. The second phase assigns clusters to vehicles and sequences them on each tour. The third phase orders nodes within clusters for every tour and genetic algorithm is used to order nodes within clusters. The simulation indicates the algorithm attains high quality results in a short time.
Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification
Directory of Open Access Journals (Sweden)
Yu Feng
2017-01-01
Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.
Improved algorithm for calculating the Chandrasekhar function
Jablonski, A.
2013-02-01
Theoretical models of electron transport in condensed matter require an effective source of the Chandrasekhar H(x,omega) function. A code providing the H(x,omega) function has to be both accurate and very fast. The current revision of the code published earlier [A. Jablonski, Comput. Phys. Commun. 183 (2012) 1773] decreased the running time, averaged over different pairs of arguments x and omega, by a factor of more than 20. The decrease of the running time in the range of small values of the argument x, less than 0.05, is even more pronounced, reaching a factor of 30. The accuracy of the current code is not affected, and is typically better than 12 decimal places. New version program summaryProgram title: CHANDRAS_v2 Catalogue identifier: AEMC_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMC_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 976 No. of bytes in distributed program, including test data, etc.: 11416 Distribution format: tar.gz Programming language: Fortran 90 Computer: Any computer with a Fortran 90 compiler Operating system: Windows 7, Windows XP, Unix/Linux RAM: 0.7 MB Classification: 2.4, 7.2 Catalogue identifier of previous version: AEMC_v1_0 Journal reference of previous version: Comput. Phys. Commun. 183 (2012) 1773 Does the new version supersede the old program: Yes Nature of problem: An attempt has been made to develop a subroutine that calculates the Chandrasekhar function with high accuracy, of at least 10 decimal places. Simultaneously, this subroutine should be very fast. Both requirements stem from the theory of electron transport in condensed matter. Solution method: Two algorithms were developed, each based on a different integral representation of the Chandrasekhar function. The final algorithm is edited by mixing these two
Directory of Open Access Journals (Sweden)
Dong Yumin
2014-01-01
Full Text Available A quantum optimization scheme in network cluster server task scheduling is proposed. We explore and research the distribution theory of energy field in quantum mechanics; specially, we apply it to data clustering. We compare the quantum optimization method with genetic algorithm (GA, ant colony optimization (ACO, simulated annealing algorithm (SAA. At the same time, we prove its validity and rationality by analog simulation and experiment.
A Novel Image Fusion Algorithm for Visible and PMMW Images based on Clustering and NSCT
Xiong Jintao; Xie Weichao; Yang Jianyu; Fu Yanlong; Hu Kuan; Zhong Zhibin
2016-01-01
Aiming at the fusion of visible and Passive Millimeter Wave (PMMW) images, a novel algorithm based on clustering and NSCT (Nonsubsampled Contourlet Transform) is proposed. It takes advantages of the particular ability of PMMW image in presenting metal target and uses the clustering algorithm for PMMW image to extract the potential target regions. In the process of fusion, NSCT is applied to both input images, and then the decomposition coefficients on different scale are combined using differ...
Directory of Open Access Journals (Sweden)
D. A. Viattchenin
2009-01-01
Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible clusterization with partial training is proposed in the paper. The method is based on data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.
A Chinese Web Page Clustering Algorithm Based on the Suffix Tree
Institute of Scientific and Technical Information of China (English)
YANG Jian-wu
2004-01-01
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy.The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page.
A Coupled User Clustering Algorithm Based on Mixed Data for Web-Based Learning Systems
Directory of Open Access Journals (Sweden)
Ke Niu
2015-01-01
Full Text Available In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.
Institute of Scientific and Technical Information of China (English)
Wu Naixing; Liao Jianxin; Zhu Xiaomin
2006-01-01
Based on the system feature of softswitch-based heterogeneous clustered media server, this paper proposed a limited resource vector load-balancing algorithm. The purpose of the algorithm was to balance the load of clusters by utilizing all system resources effectively and to avoid violent shaking of the system performance. A lot of simulations on the Petri net model of load balance system are conducted and the algorithm is compared with some traditional algorithms on balancing ability for heterogeneity, system throughput, request response time and performance stability. The results of simulations show that the algorithm achieves system higher performance and it has excellent ability to deal with the heterogeneity of clustered media server.
Davis, Jack B. A.; Shayeghi, Armin; Horswell, Sarah L.; Johnston, Roy L.
2015-08-01
A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters.A new open-source parallel genetic algorithm, the Birmingham parallel genetic algorithm, is introduced for the direct density functional theory global optimisation of metallic nanoparticles. The program utilises a pool genetic algorithm methodology for the efficient use of massively parallel computational resources. The scaling capability of the Birmingham parallel genetic algorithm is demonstrated through its application to the global optimisation of iridium clusters with 10 to 20 atoms, a catalytically important system with interesting size-specific effects. This is the first study of its type on Iridium clusters of this size and the parallel algorithm is shown to be capable of scaling beyond previous size restrictions and accurately characterising the structures of these larger system sizes. By globally optimising the system directly at the density functional level of theory, the code captures the cubic structures commonly found in sub-nanometre sized Ir clusters. Electronic supplementary information
Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data
Institute of Scientific and Technical Information of China (English)
Haiyan Pan; Jun Zhu; Danfu Han
2003-01-01
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance of HGACLUS and other methods was compared by using simulated data and open microarray gene-expression datasets. HGACLUS was generally found to be more accurate and robust than other methods discussed in this paper by the exact validation strategy and the explicit cluster number.
Clustering Algorithm As A Planning Support Tool For Rural Electrification Optimization
Directory of Open Access Journals (Sweden)
Ronaldo Pornillosa Parreno Jr
2015-08-01
Full Text Available Abstract In this study clustering algorithm was developed to optimize electrification plans by screening and grouping potential customers to be supplied with electricity. The algorithm provided adifferent approach in clustering problem which combines conceptual and distance-based clustering algorithmsto analyze potential clusters using spanning tree with the shortest possible edge weight and creating final cluster trees based on the test of inconsistency for the edges. The clustering criteria consists of commonly used distance measure with the addition of household information as basis for the ability to pay ATP value. The combination of these two parameters resulted to a more significant and realistic clusters since distance measure alone could not take the effect of the household characteristics in screening the most sensible groupings of households. In addition the implications of varying geographical features were incorporated in the algorithm by using routing index across the locations of the households. This new approach of connecting the households in an area was applied in an actual case study of one village or barangay that was not yet energized. The results of clustering algorithm generated cluster trees which could becomethetheoretical basis for power utilities to plan the initial network arrangement of electrification. Scenario analysis conducted on the two strategies of clustering the households provideddifferent alternatives for the optimization of the cost of electrification. Futhermorethe benefits associated with the two strategies formulated from the two scenarios was evaluated using benefit cost ratio BC to determine which is more economically advantageous. The results of the study showed that clustering algorithm proved to be effective in solving electrification optimization problem and serves its purpose as a planning support tool which can facilitate electrification in rural areas and achieve cost-effectiveness.
A novel functional module detection algorithm for protein-protein interaction networks
Directory of Open Access Journals (Sweden)
Zhang Aidong
2006-12-01
Full Text Available Abstract Background The sparse connectivity of protein-protein interaction data sets makes identification of functional modules challenging. The purpose of this study is to critically evaluate a novel clustering technique for clustering and detecting functional modules in protein-protein interaction networks, termed STM. Results STM selects representative proteins for each cluster and iteratively refines clusters based on a combination of the signal transduced and graph topology. STM is found to be effective at detecting clusters with a diverse range of interaction structures that are significant on measures of biological relevance. The STM approach is compared to six competing approaches including the maximum clique, quasi-clique, minimum cut, betweeness cut and Markov Clustering (MCL algorithms. The clusters obtained by each technique are compared for enrichment of biological function. STM generates larger clusters and the clusters identified have p-values that are approximately 125-fold better than the other methods on biological function. An important strength of STM is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches. Conclusion STM outperforms competing approaches and is capable of effectively detecting both densely and sparsely connected, biologically relevant functional modules with fewer discards.
A Hybrid Distributed Mutual Exclusion Algorithm for Cluster-Based Systems
Directory of Open Access Journals (Sweden)
Moharram Challenger
2013-01-01
Full Text Available Distributed mutual exclusion is a fundamental problem which arises in various systems such as grid computing, mobile ad hoc networks (MANETs, and distributed databases. Reducing key metrics like message count per any critical section (CS and delay between two CS entrances, which is known as synchronization delay, is a great challenge for this problem. Various algorithms use either permission-based or token-based protocols. Token-based algorithms offer better communication costs and synchronization delay. Raymond's and Suzuki-Kasami's algorithms are well-known token-based ones. Raymond's algorithm needs only O(log2(N messages per CS and Suzuki-Kasami's algorithm needs just one message delivery time between two CS entrances. Nevertheless, both algorithms are weak in the other metric, synchronization delay and message complexity correspondingly. In this work, a new hybrid algorithm is proposed which gains from powerful aspects of both algorithms. Raysuz's algorithm (the proposed algorithm uses a clustered graph and executes Suzuki-Kasami's algorithm intraclusters and Raymond's algorithm interclusters. This leads to have better message complexity than that of pure Suzuki-Kasami's algorithm and better synchronization delay than that of pure Raymond's algorithm, resulting in an overall efficient DMX algorithm pure algorithm.
Ansari, Elnaz Saberi; Eslahchi, Changiz; Pezeshk, Hamid; Sadeghi, Mehdi
2014-09-01
Decomposition of structural domains is an essential task in classifying protein structures, predicting protein function, and many other proteomics problems. As the number of known protein structures in PDB grows exponentially, the need for accurate automatic domain decomposition methods becomes more essential. In this article, we introduce a bottom-up algorithm for assigning protein domains using a graph theoretical approach. This algorithm is based on a center-based clustering approach. For constructing initial clusters, members of an independent dominating set for the graph representation of a protein are considered as the centers. A distance matrix is then defined for these clusters. To obtain final domains, these clusters are merged using the compactness principle of domains and a method similar to the neighbor-joining algorithm considering some thresholds. The thresholds are computed using a training set consisting of 50 protein chains. The algorithm is implemented using C++ language and is named ProDomAs. To assess the performance of ProDomAs, its results are compared with seven automatic methods, against five publicly available benchmarks. The results show that ProDomAs outperforms other methods applied on the mentioned benchmarks. The performance of ProDomAs is also evaluated against 6342 chains obtained from ASTRAL SCOP 1.71. ProDomAs is freely available at http://www.bioinf.cs.ipm.ir/software/prodomas. © 2014 Wiley Periodicals, Inc.
An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Directory of Open Access Journals (Sweden)
Jian Zhang
2014-01-01
Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.
Clustered K nearest neighbor algorithm for daily inflow forecasting
Akbari, M.; Van Overloop, P.J.A.T.M.; Afshar, A.
2010-01-01
Instance based learning (IBL) algorithms are a common choice among data driven algorithms for inflow forecasting. They are based on the similarity principle and prediction is made by the finite number of similar neighbors. In this sense, the similarity of a query instance is estimated according to
A Hybrid Algorithm for Optimizing Multi- Modal Functions
Institute of Scientific and Technical Information of China (English)
Li Qinghua; Yang Shida; Ruan Youlin
2006-01-01
A new genetic algorithm is presented based on the musical performance. The novelty of this algorithm is that a new genetic algorithm, mimicking the musical process of searching for a perfect state of harmony, which increases the robustness of it greatly and gives a new meaning of it in the meantime, has been developed. Combining the advantages of the new genetic algorithm, simplex algorithm and tabu search, a hybrid algorithm is proposed. In order to verify the effectiveness of the hybrid algorithm, it is applied to solving some typical numerical function optimization problems which are poorly solved by traditional genetic algorithms. The experimental results show that the hybrid algorithm is fast and reliable.
Directory of Open Access Journals (Sweden)
Jing Chen
2015-06-01
Full Text Available This study takes the concept of food logistics distribution as the breakthrough point, by means of the aim of optimization of food logistics distribution routes and analysis of the optimization model of food logistics route, as well as the interpretation of the genetic algorithm, it discusses the optimization of food logistics distribution route based on genetic and cluster scheme algorithm.
GenClust: A genetic algorithm for clustering gene expression data
Directory of Open Access Journals (Sweden)
Raimondi Alessandra
2005-12-01
Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
GenClust: a genetic algorithm for clustering gene expression data.
Di Gesú, Vito; Giancarlo, Raffaele; Lo Bosco, Giosué; Raimondi, Alessandra; Scaturro, Davide
2005-12-07
Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
The Loop-Cluster Algorithm for the Case of the 6 Vertex Model
Evertz, H G
1993-01-01
We present the loop algorithm, a new type of cluster algorithm that we recently introduced for the F model. Using the framework of Kandel and Domany, we show how to GENERALIZE the algorithm to the arrow flip symmetric 6 vertex model. We propose the principle of least possible freezing as the guide to choosing the values of free parameters in the algorithm. Finally, we briefly discuss the application of our algorithm to simulations of quantum spin systems. In particular, all necessary information is provided for the simulation of spin $\\half$ Heisenberg and $xxz$ models.
CMA: an efficient index algorithm of clustering supporting fast retrieval of large image databases
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
To realize content-based retrieval of large image databases, it is required to develop an efficient index and retrieval scheme. This paper proposes an index algorithm of clustering called CMA, which supports fast retrieval of large image databases. CMA takes advantages of k-means and self-adaptive algorithms. It is simple and works without any user interactions. There are two main stages in this algorithm. In the first stage, it classifies images in a database into several clusters, and automatically gets the necessary parameters for the next stage-k-means iteration. The CMA algorithm is tested on a large database of more than ten thousand images and compare it with k-means algorithm. Experimental results show that this algorithm is effective in both precision and retrieval time.
Cahyaningrum, Rosalia D.; Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Technology of microarray became one of the imperative tools in life science to observe the gene expression levels, one of which is the expression of the genes of people with carcinoma. Carcinoma is a cancer that forms in the epithelial tissue. These data can be analyzed such as the identification expressions hereditary gene and also build classifications that can be used to improve diagnosis of carcinoma. Microarray data usually served in large dimension that most methods require large computing time to do the grouping. Therefore, this study uses spectral clustering method which allows to work with any object for reduces dimension. Spectral clustering method is a method based on spectral decomposition of the matrix which is represented in the form of a graph. After the data dimensions are reduced, then the data are partitioned. One of the famous partition method is Partitioning Around Medoids (PAM) which is minimize the objective function with exchanges all the non-medoid points into medoid point iteratively until converge. Objectivity of this research is to implement methods spectral clustering and partitioning algorithm PAM to obtain groups of 7457 genes with carcinoma based on the similarity value. The result in this study is two groups of genes with carcinoma.
Homogeneous clusters over India using probability density function of daily rainfall
Kulkarni, Ashwini
2017-07-01
The Indian landmass has been divided into homogeneous clusters by applying the cluster analysis to the probability density function of a century-long time series of daily summer monsoon (June through September) rainfall at 357 grids over India, each of approximately 100 km × 100 km. The analysis gives five clusters over Indian landmass; only cluster 5 happened to be the contiguous region and all other clusters are dispersed away which confirms the erratic behavior of daily rainfall over India. The area averaged seasonal rainfall over cluster 5 has a very strong relationship with Indian summer monsoon rainfall; also, the rainfall variability over this region is modulated by the most important mode of climate system, i.e., El Nino Southern Oscillation (ENSO). This cluster could be considered as the representative of the entire Indian landmass to examine monsoon variability. The two-sample Kolmogorov-Smirnov test supports that the cumulative distribution functions of daily rainfall over cluster 5 and India as a whole do not differ significantly. The clustering algorithm is also applied to two time epochs 1901-1975 and 1976-2010 to examine the possible changes in clusters in a recent warming period. The clusters are drastically different in two time periods. They are more dispersed in recent period implying the more erroneous distribution of daily rainfall in recent period.
Chen, S; Mulgrew, B; Grant, P M
1993-01-01
The application of a radial basis function network to digital communications channel equalization is examined. It is shown that the radial basis function network has an identical structure to the optimal Bayesian symbol-decision equalizer solution and, therefore, can be employed to implement the Bayesian equalizer. The training of a radial basis function network to realize the Bayesian equalization solution can be achieved efficiently using a simple and robust supervised clustering algorithm. During data transmission a decision-directed version of the clustering algorithm enables the radial basis function network to track a slowly time-varying environment. Moreover, the clustering scheme provides an automatic compensation for nonlinear channel and equipment distortion. Computer simulations are included to illustrate the analytical results.
Coordinating Clusters: A Cross Sectoral Study of Cluster Organization Functions in The Netherlands
Directory of Open Access Journals (Sweden)
Philipp J.P. Garbade
2013-02-01
Full Text Available The present paper aims at answering the question how cluster organization functions are implemented in a high‐tech, a medium to high‐tech and a low to medium‐tech cluster. Data were collected by semi‐structured interviews from three clusters in the Netherlands, an agri‐food cluster (as an example of a low to medium‐tech cluster, a green biotech cluster (medium to high‐tech and a high‐tech cluster. Concerning the cluster organization functions a number of similarities were found. For all three clusters it can be concluded that the network support function is considered to be very important. Sector independence can further be found concerning the innovation process support function, specifically regarding the promotion of the region as an attractive living and working area for highly qualified employees. The results also show anumber of clear differences among the investigated clusters. Only in the low‐to‐medium tech agri‐food cluster there was a clear need for internationalization support for SMEs to reach foreign markets. Only in the green biotech cluster the demand articulation was focused on the region where the cluster is based, which stands in contrast to the highly international orientation of the member companies. Only in the high‐tech innovation cluster technology road mapping was extensively used. This powerful tool, developed to align the innovation process at the company and sector level, impacted further on the execution of the demand articulation/ network formation support functions, and could also be helpful for the green biotech and the agri‐food clusters. Throughout the paper different cluster categorization schemes are besides the tech level are applied and give insights on their limitations and how to possibly deal with them in inter sectorial cluster comparison research.
Scheme for Implementing Quantum Search Algorithm in a Cluster State Quantum Computer
Institute of Scientific and Technical Information of China (English)
ZHANG Da-Li; WANG Yan-Hui; ZHANG Yong
2008-01-01
Using cluster state and single qubit measurement one can perform the one-way quantum computation. Here we give a detailed scheme for realizing a modified Grover search algorithm using measurements on cluster state. We give the measurement pattern for the duster-state realization of the algorithm and estimated the number of measurement needed for its implementation. It is found that O(23n/2n2) number of single qubit measurements is required for its realization in a cluster-state quantum computer.
Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization
Directory of Open Access Journals (Sweden)
A. K. Santra
2011-09-01
Full Text Available An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases document clustering rate. The performance of the proposed work is measured in terms of cluster objects accuracy, term weight, term frequency and inverse document frequency. Experimental results demonstrate that it achieves very good performance on both feature selection and text document categorization, compared to other classifier methods.
Empirical relations between static and dynamic exponents for Ising model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
1992-02-01
We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is zint,EW=α/ν. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average maximum cluster size gives either a constant or a logarithm, which implies that zint,ESW=β/ν for the Ising model.
Empirical relations between static and dynamic exponents for Ising model cluster algorithms
Energy Technology Data Exchange (ETDEWEB)
Coddington, P.D. (Department of Physics, Syracuse University, Syracuse, New York 13244 (United States)); Baillie, C.F. (Department of Physics, University of Colorado, Boulder, Colorado 80309 (United States))
1992-02-17
We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is {ital z}{sub int,}{ital E}{sup W}={alpha}/{nu}. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average maximum cluster size gives either a constant or a logarithm, which implies that {ital z}{sub int,}{ital E}{sup SW}={beta}/{nu} for the Ising model.
AN IMPROVED ALGORITHM FOR SUPERVISED FUZZY C-MEANS CLUSTERING OF REMOTELY SENSED DATA
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de creased as comparing with that by a conventional algorithm: that is , the classification accura cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C. S.
1985-01-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales.
Energy Technology Data Exchange (ETDEWEB)
Barnes, J.; Dekel, A.; Efstathiou, G.; Frenk, C.S.
1985-08-01
The cluster correlation function xi sub c(r) is compared with the particle correlation function, xi(r) in cosmological N-body simulations with a wide range of initial conditions. The experiments include scale-free initial conditions, pancake models with a coherence length in the initial density field, and hybrid models. Three N-body techniques and two cluster-finding algorithms are used. In scale-free models with white noise initial conditions, xi sub c and xi are essentially identical. In scale-free models with more power on large scales, it is found that the amplitude of xi sub c increases with cluster richness; in this case the clusters give a biased estimate of the particle correlations. In the pancake and hybrid models (with n = 0 or 1), xi sub c is steeper than xi, but the cluster correlation length exceeds that of the points by less than a factor of 2, independent of cluster richness. Thus the high amplitude of xi sub c found in studies of rich clusters of galaxies is inconsistent with white noise and pancake models and may indicate a primordial fluctuation spectrum with substantial power on large scales. 30 references.
Fodeh, Samah J; Lazenby, Mark; Bai, Mei; Ercolano, Elizabeth; Murphy, Terrence; McCorkle, Ruth
2013-10-01
Symptoms and subsequent functional impairment have been associated with the biological processes of disease, including the interaction between disease and treatment in a measurement model of symptoms. However, hitherto cluster analysis has primarily focused on symptoms. This study among patients within 100 days of diagnosis with advanced cancer explored whether self-reported physical symptoms and functional impairments formed clusters at the time of diagnosis. We applied cluster analysis to self-reported symptoms and activities of daily living of 111 patients newly diagnosed with advanced gastrointestinal (GI), gynecological, head and neck, and lung cancers. Based on content expert evaluations, the best techniques and variables were identified, yielding the best solution. The best cluster solution used a K-means algorithm and cosine similarity and yielded five clusters of physical as well as emotional symptoms and functional impairments. Cancer site formed the predominant organizing principle of composition for each cluster. The top five symptoms and functional impairments in each cluster were Cluster 1 (GI): outlook, insomnia, appearance, concentration, and eating/feeding; Cluster 2 (GI): appetite, bowel, insomnia, eating/feeding, and appearance; Cluster 3 (gynecological): nausea, insomnia, eating/feeding, concentration, and pain; Cluster 4 (head and neck): dressing, eating/feeding, bathing, toileting, and walking; and Cluster 5 (lung): cough, walking, eating/feeding, breathing, and insomnia. Functional impairments in patients newly diagnosed with late-stage cancers behave as symptoms during the diagnostic phase. Health care providers need to expand their assessments to include both symptoms and functional impairments. Early recognition of functional changes may accelerate diagnosis at an earlier cancer stage. Copyright © 2013 U.S. Cancer Pain Relief Committee. Published by Elsevier Inc. All rights reserved.
Clustered functional MRI of overt speech production.
Sörös, Peter; Sokoloff, Lisa Guttman; Bose, Arpita; McIntosh, Anthony R; Graham, Simon J; Stuss, Donald T
2006-08-01
To investigate the neural network of overt speech production, event-related fMRI was performed in 9 young healthy adult volunteers. A clustered image acquisition technique was chosen to minimize speech-related movement artifacts. Functional images were acquired during the production of oral movements and of speech of increasing complexity (isolated vowel as well as monosyllabic and trisyllabic utterances). This imaging technique and behavioral task enabled depiction of the articulo-phonologic network of speech production from the supplementary motor area at the cranial end to the red nucleus at the caudal end. Speaking a single vowel and performing simple oral movements involved very similar activation of the cortical and subcortical motor systems. More complex, polysyllabic utterances were associated with additional activation in the bilateral cerebellum, reflecting increased demand on speech motor control, and additional activation in the bilateral temporal cortex, reflecting the stronger involvement of phonologic processing.
A randomized algorithm for two-cluster partition of a set of vectors
Kel'manov, A. V.; Khandeev, V. I.
2015-02-01
A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.
A Comparison of Algorithms for the Construction of SZ Cluster Catalogues
Melin, J -B; Bartelmann, M; Bartlett, J G; Betoule, M; Bobin, J; Carvalho, P; Chon, G; Delabrouille, J; Diego, J M; Harrison, D L; Herranz, D; Hobson, M; Kneissl, R; Lasenby, A N; Jeune, M Le; Lopez-Caniego, M; Mazzotta, P; Rocha, G M; Schaefer, B M; Starck, J -L; Waizmann, J -C; Yvon, D
2012-01-01
We evaluate the construction methodology of an all-sky catalogue of galaxy clusters detected through the Sunyaev-Zel'dovich (SZ) effect. We perform an extensive comparison of twelve algorithms applied to the same detailed simulations of the millimeter and submillimeter sky based on a Planck-like case. We present the results of this "SZ Challenge" in terms of catalogue completeness, purity, astrometric and photometric reconstruction. Our results provide a comparison of a representative sample of SZ detection algorithms and highlight important issues in their application. In our study case, we show that the exact expected number of clusters remains uncertain (about a thousand cluster candidates at |b|> 20 deg with 90% purity) and that it depends on the SZ model and on the detailed sky simulations, and on algorithmic implementation of the detection methods. We also estimate the astrometric precision of the cluster candidates which is found of the order of ~2 arcmins on average, and the photometric uncertainty of...
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Directory of Open Access Journals (Sweden)
Luo Zhong
2014-01-01
Full Text Available With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data.
GPU-based single-cluster algorithm for the simulation of the Ising model
Komura, Yukihiro; Okabe, Yutaka
2012-02-01
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L = 4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L = 256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.
GPU-based single-cluster algorithm for the simulation of the Ising model
Komura, Yukihiro
2011-01-01
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L=4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L=256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Directory of Open Access Journals (Sweden)
C. Vimalarani
2016-01-01
Full Text Available Wireless Sensor Network (WSN is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
Directory of Open Access Journals (Sweden)
Ashim Kumar Ghosh
2011-12-01
Full Text Available Wireless sensor nodes are use most embedded computing application. Multihop cluster hierarchy has been presented for large wireless sensor networks (WSNs that can provide scalable routing, data aggregation, and querying. The energy consumption rate for sensors in a WSN varies greatly based on the protocols the sensors use for communications. In this paper we present a cluster based routing algorithm. One of our main goals is to design the energy efficient routing protocol. Here we try to solve the usual problems of WSNs. We know the efficiency of WSNs depend upon the distance between node to base station and the amount of data to be transferred and the performance of clustering is greatly influenced by the selection of cluster-heads, which are in charge of creating clusters and controlling member nodes. This algorithm makes the best use of node with low number of cluster head know as super node. Here we divided the full region in four equal zones and the centre area of the region is used to select for super node. Each zone is considered separately and the zone may be or not divided further that’s depending upon the density of nodes in that zone and capability of the super node. This algorithm forms multilayer communication. The no of layer depends on the network current load and statistics. Our algorithm is easily extended to generate a hierarchy of cluster heads to obtain better network management and energy efficiency.
UNSUPERVISED DATA AND HISTOGRAM CLUSTERING USING INCLINED PLANES SYSTEM OPTIMIZATION ALGORITHM
Directory of Open Access Journals (Sweden)
Mohammad Hamed Mozaffari
2014-03-01
Full Text Available Within the last decades, clustering has gained significant recognition as one of the data mining methods, especially in the relatively new field of medical engineering for diagnosing cancer. Clustering is used as a database to automatically group items with similar characteristics. Researchers aim to introduce a novel and powerful algorithm known as Inclined Planes system Optimization (IPO, with capacity to overcome clustering problems. The proposed method identifies each agent used in the algorithm to indicate the centroids of the clusters and automatically select the number of centroids in each time interval (unsupervised clustering. The evaluation method for clustering is based on the Davies Bouldin index (DBi to show cluster validity. Researchers compare known algorithm on series of data bases from various studies to demonstrate the power and capability of the proposed method. These datasets are popular for pattern recognition with diversity in space dimension. Method performance was tested on standard images as a dataset. Study results show significant method advantage over other algorithms.
The Mass Function of Nearby Galaxy Clusters
Biviano, A; Giuricin, G; Mardirossian, F; Mezzetti, M
1993-01-01
We present the distribution of virial masses for nearby galaxy clusters, as obtained from a data-set of 75 clusters, each having at least 20 galaxy members with measured redshifts within 1 Abell radius. After having accounted for problems of incompleteness of the data-set, we fitted a power-law to the cluster mass distribution.
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: The process of network planning is divided into two sub steps. The first step is determining the location of the Multi Service Access Node (MSAN. The second step is the construction of subscriber network lines from MSAN to subscribers to satisfy optimization criteria and design constraints. Due to the complexity of this process artificial intelligence and clustering techniques have been successfully deployed to solve many problems. The problems of the locations of MSAN, the cabling layout and the computation of optimum cable network layouts have been addressed in this study. The proposed algorithm, Clustering density-Based Spatial of Applications with Noise original, minimal Spanning tree and modified Ant-Colony-Based algorithm (CBSCAN-SPANT, used two clustering algorithms which are density-based and agglomerative clustering algorithm using distances which are shortest paths distance and satisfying the network constraints. This algorithm used wire and wireless technology to serve the subscribers demand and place the switches in a real optimal place. Approach: The density-based Spatial Clustering of Applications with Noise original (DBSCAN algorithm has been modified and a new algorithm (NetPlan algorithm has been proposed by the author in a recent work to solve the first step in the problem of network planning. In the present study, the NetPlan algorithm is modified by introduce the modified Ant-Colony-Based algorithm to find the optimal path between any node and the corresponding MSAN node in the first step of network planning process to determine nodes belonging to each cluster. The second step, in the process of network planning, is also introduced in the present study. For each cluster, the optimal cabling layout from each MSAN to the subscriber premises is determining by introduce the Prime algorithm which construct minimal spanning tree. Results: Experimental results and analysis indicate that the
Multidistribution Center Location Based on Real-Parameter Quantum Evolutionary Clustering Algorithm
Directory of Open Access Journals (Sweden)
Huaixiao Wang
2014-01-01
Full Text Available To determine the multidistribution center location and the distribution scope of the distribution center with high efficiency, the real-parameter quantum-inspired evolutionary clustering algorithm (RQECA is proposed. RQECA is applied to choose multidistribution center location on the basis of the conventional fuzzy C-means clustering algorithm (FCM. The combination of the real-parameter quantum-inspired evolutionary algorithm (RQIEA and FCM can overcome the local search defect of FCM and make the optimization result independent of the choice of initial values. The comparison of FCM, clustering based on simulated annealing genetic algorithm (CSAGA, and RQECA indicates that RQECA has the same good convergence as CSAGA, but the search efficiency of RQECA is better than that of CSAGA. Therefore, RQECA is more efficient to solve the multidistribution center location problem.
Energy Technology Data Exchange (ETDEWEB)
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
An approximation polynomial-time algorithm for a sequence bi-clustering problem
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
AN OPTIMIZED WEIGHT BASED CLUSTERING ALGORITHM IN HETEROGENEOUS WIRELESS SENSOR NETWORKS
Directory of Open Access Journals (Sweden)
Babu.N.V
2012-12-01
Full Text Available The last few years have seen an increased interest in the potential use of wireless sensor networks (WSNs in various fields like disaster management, battle field surveillance, and border security surveillance. In such applications, a large number of sensor nodes are deployed, which are often unattended and work autonomously. The process of dividing the network into interconnected substructures is called clustering and the interconnected substructures are called clusters. The cluster head (CH of each cluster act as a coordinator within the substructure. Each CH acts as a temporary base station within its zone or cluster. It also communicates with other CHs. Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. It can also increase network scalability. Researchers in all fields of wireless sensor network believe that nodes are homogeneous, but some nodes may be of different characteristics to prolong the lifetime of a WSN and its reliability. We have proposed an algorithm for better cluster head selection based on weights for different parameter that influence on energy consumption which includes distance from base station as a new parameter to reduce number of transmissions and reduce energy consumption by sensor nodes. Finally proposed algorithm compared with the WCA, IWCA algorithm in terms of number of clusters and energy consumption.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Directory of Open Access Journals (Sweden)
Amineh Amini
2014-01-01
Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Graph-based clustering and data visualization algorithms
Vathy-Fogarassy, Ágnes
2013-01-01
This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on
An Efficient Clustering Algorithm for k-Anonymisation
Institute of Scientific and Technical Information of China (English)
Grigorios Loukides; Jian-Hua Shao
2008-01-01
K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two conflicting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.
DYNAMIC REQUEST DISPATCHING ALGORITHM FOR WEB SERVER CLUSTER
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
The overall increase in traffic on the WWWcauses a disproportionate increase in client requeststo popular web sites.Site administrators constantlyface the requirement to i mprove server's capacity.Web server cluster is a popular solution.It usesgroup of independent servers that are managed as asingle systemfor higher availability,easier manage-ability and greater scalability.Many web sites haveadopted this solution.Request dispatching[1-2]is one of the core tech-nologies used by parallel web server clusters...
Function Optimization Based on Quantum Genetic Algorithm
Ying Sun; Hegen Xiong
2014-01-01
Optimization method is important in engineering design and application. Quantum genetic algorithm has the characteristics of good population diversity, rapid convergence and good global search capability and so on. It combines quantum algorithm with genetic algorithm. A novel quantum genetic algorithm is proposed, which is called Variable-boundary-coded Quantum Genetic Algorithm (vbQGA) in which qubit chromosomes are collapsed into variable-boundary-coded chromosomes instead of binary-coded c...
Function Optimization Based on Quantum Genetic Algorithm
Ying Sun; Yuesheng Gu; Hegen Xiong
2013-01-01
Quantum genetic algorithm has the characteristics of good population diversity, rapid convergence and good global search capability and so on.It combines quantum algorithm with genetic algorithm. A novel quantum genetic algorithm is proposed ,which is called variable-boundary-coded quantum genetic algorithm (vbQGA) in which qubit chromosomes are collapsed into variableboundary- coded chromosomes instead of binary-coded chromosomes. Therefore much shorter chromosome strings can be gained.The m...
Directory of Open Access Journals (Sweden)
Silvis Remko
2008-12-01
Full Text Available Abstract Background A typical step in the analysis of gene expression data is the determination of clusters of genes that exhibit similar expression patterns. Researchers are confronted with the seemingly arbitrary choice between numerous algorithms to perform cluster analysis. Results We developed an exploratory application that benchmarks the results of clustering methods using functional annotations. In addition, a de novo DNA motif discovery algorithm is integrated in our program which identifies overrepresented DNA binding sites in the upstream DNA sequences of genes from the clusters that are indicative of sites of transcriptional control. The performance of our program was evaluated by comparing the original results of a time course experiment with the findings of our application. Conclusion DISCLOSE assists researchers in the prokaryotic research community in systematically evaluating results of the application of a range of clustering algorithms to transcriptome data. Different performance measures allow to quickly and comprehensively determine the best suited clustering approach for a given dataset.
A community detection algorithm based on topology potential and spectral clustering.
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Directory of Open Access Journals (Sweden)
Zhixiao Wang
2014-01-01
Full Text Available Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes’ topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods.
Multi-Parameter Signal Sorting Algorithm Based on Dynamic Distance Clustering
Institute of Scientific and Technical Information of China (English)
Ai-Ling He; De-Guo Zeng; Jun Wang; Bin Tang
2009-01-01
A multi-parameter signal sorting algo- rithm for interleaved radar pulses in dense emitter environment is presented. The algorithm includes two parts, pulse classification and pulse repetition interval (PRI) analysis. Firstly, we propose the dynamic distance clustering (DDC) for classification. In the clustering algorithm, the multi-dimension features of radar pulse are used for reliable classification. The similarity threshold estimation method in DDC is derived, which contributes to the efficiency of the algorithm. However, DDC has large computation with many signal pulses. Then, in order to sort radar signals in real time, the improved DDC (IDDC) algorithm is proposed. Finally, PRI analysis is adopted to complete the process of sorting. The simulation experiments and hardware implementations show both algorithms are effective.
Xiao, Yong Liang
Molecular packing, clustering, and docking computations have been performed by empirical intermolecular energy minimization methods. The main focus of this study is finding a robust global search algorithm to solve intermolecular interaction problems, especially to apply an efficient algorithm to large-scale complex molecular systems such as drug-DNA binding or site selectivity which has increasing importance in drug design and drug discovery. Molecular packing in benzene, naphthalene, and anthracene crystals is analyzed in terms of molecular dimer interaction. Intermolecular energies of the gas dimer molecules are calculated for various intermolecular distances and orientations using empirical potential energy functions. The gas dimers are compared to pairs of molecules extracted from the observed crystal structures. Net atomic charges are obtained by the potential-derived method from 6-31G and 6-31G^{**} level ab initio wavefunctions. A new approach using a genetic algorithm is applied to predict structures of benzene, naphthalene, and anthracene molecular clusters. The computer program GAME (genetic algorithm for minimization of energy) has been developed to obtain the global energy minimum of clusters of dimer, trimer, and tetramer molecules. This test model has been further developed to applications of molecular docking. Docking calculations of deoxyguanosine molecules to actinomycin D were performed successfully to identify the binding sites of the drug molecule, which was revealed by actinomycin D-deoxyguanosine complex from the solved x-ray crystal structure. The comparison between the evolutionary computing method and conventional local optimization methods concluded that genetic algorithms are very competitive when it comes to complex, large-scale optimization. Full power of genetic algorithms can be unveiled in computer-assisted drug design only when the difficulties of including optimized molecular conformation in the algorithm are overcome. These
CONTROLLING THE FORMATION AND FUNCTIONING OF THE CLUSTER
Directory of Open Access Journals (Sweden)
A. S. Barzenkova
2013-01-01
Full Text Available The article presents the author's concept of the technique of controlling the formation and functioning of the cluster, which allows to objectively estimate the process of the formation of cluster, trace the dynamics of its life and assess the main areas and indi-cators cluster operation.
XML document clustering method based on quantum genetic algorithm%基于量子遗传算法的XML聚类方法
Institute of Scientific and Technical Information of China (English)
蒋勇; 谭怀亮; 李光文
2011-01-01
This paper maiuly targets on XML clustering with kernel methods for pattern analysis and the quantum genetic algorithm.Then, a new method based on the quantum genetic algorithm and kernel clustering algorithm was proposed.To eliminate the XML documents first, the vector space kernel's kernel matrix was generated with frequent-tag sequence, the initial clustering and clustering center with the Gaussian kernel functions were solved, then the quantum genetic algorithm's initial populations were constructed by the initial clustering center structure.Clustering of the globally optimal solutions was obtained through the combination of quantum genetic algorithm and kernel clustering algorithm.The experimental results show that the proposed algorithm is superior to the improved kernel clustering algorithm and K-means in good astringency, stability and overall optimal solutions.%主要用模式分析的核方法与量子遗传算法相结合研究XML聚类,提出了一种基于量子遗传算法混合核聚算法的XML文档聚类新方法.该方法先对XML文档约简,以频繁标签序列建立向量空间核的核矩阵,用高斯核函数求解初始聚类和聚类中心,然后用初始聚类中心构造量子遗传算法的初始种群,通过量子遗传算法与核聚算法相结合求得全局最优解的聚类.实验结果表明,使用该算法的聚类比改进的核聚算法、K均值算法等单一方法具有良好的收敛性、稳定性和更高的全局最优.
Directory of Open Access Journals (Sweden)
S.Praveena
2015-06-01
Full Text Available This paper presents a hybrid clustering algorithm and feed-forward neural network classifier for land-cover mapping of trees, shade, building and road. It starts with the single step preprocessing procedure to make the image suitable for segmentation. The pre-processed image is segmented using the hybrid genetic-Artificial Bee Colony(ABC algorithm that is developed by hybridizing the ABC and FCM to obtain the effective segmentation in satellite image and classified using neural network . The performance of the proposed hybrid algorithm is compared with the algorithms like, k-means, Fuzzy C means(FCM, Moving K-means, Artificial Bee Colony(ABC algorithm, ABC-GA algorithm, Moving KFCM and KFCM algorithm.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.
Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B
2011-08-15
Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Institute of Scientific and Technical Information of China (English)
TANG Cheng-long; WANG Shi-gang; LIANG Qin-hua; XU Wei
2009-01-01
Transversal distribution of the steel strip thickness in the entry section of the cold rolling mill seriously affects to the flatness and transversal thickness precision of the final products. Pattern clustering method is introduced into the steel rolling field and used in the patterns recognition of transversal distribution of the steel strip thickness. The well-known k-means clustering algorithm has the advantage of being easily completed, but still has some drawbacks. An improved k-means clustering algorithm is presented, and the main improvements include: (1) the initial clustering points are preselected according to the density queue of data objects; and (2) Mahatanobis distance is applied instead of Euclidean distance in the actual application. Compared to the patterns obtained from the common k-means algorithm, the patterns identified by the improved algorithm show that the improved clustering algorithm is well suitable for the patterns' recognition of transversal distribution of steel strip thickness and it will be useful in on-line quality control system.
Directory of Open Access Journals (Sweden)
J Anuradha
2014-05-01
Full Text Available Attention Deficit Hyperactive Disorder (ADHD is a disruptive neurobehavioral disorder characterized by abnormal behavioral patterns in attention, perusing activity, acting impulsively and combined types. It is predominant among school going children and it is tricky to differentiate between an active and an ADHD child. Misdiagnosis and undiagnosed cases are very common. Behavior patterns are identified by the mentors in the academic environment who lack skills in screening those kids. Hence an unsupervised learning algorithm can cluster the behavioral patterns of children at school for diagnosis of ADHD. In this paper, we propose a hierarchical clustering algorithm to partition the dataset based on attribute dependency (HCAD. HCAD forms clusters of data based on the high dependent attributes and their equivalence relation. It is capable of handling large volumes of data with reasonably faster clustering than most of the existing algorithms. It can work on both labeled and unlabelled data sets. Experimental results reveal that this algorithm has higher accuracy in comparison to other algorithms. HCAD achieves 97% of cluster purity in diagnosing ADHD. Empirical analysis of application of HCAD on different data sets from UCI repository is provided.
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
Directory of Open Access Journals (Sweden)
Tonny J. Oyana
2010-01-01
Full Text Available The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique—the Fast, Efficient, and Scalable k-means algorithm (FES-k-means. The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
The Cluster Mass Function from Early SDSS Data: Cosmological Implications
Bahcall, Neta A.; Dong, Feng; Bode, Paul; Kim, Rita; Annis, James; Mckay, Timothy A.; Hansen, Sarah; Gunn, James; Ostriker, Jeremiah P.; Postman, Marc; Nichol, Robert C.; Goto, Tomotsugu; Brinkmann, Jon; Knapp, Gillian R.; Lamb, Don O.
2002-01-01
The mass function of clusters of galaxies is determined from 400 deg^2 of early commissioning imaging data of the Sloan Digital Sky Survey; ~300 clusters in the redshift range z = 0.1 - 0.2 are used. Clusters are selected using two independent selection methods: a Matched Filter and a red-sequence color magnitude technique. The two methods yield consistent results. The cluster mass function is compared with large-scale cosmological simulations. We find a best-fit cluster normalization relatio...
Hybrid Swarm Intelligence Energy Efficient Clustered Routing Algorithm for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Rajeev Kumar
2016-01-01
Full Text Available Currently, wireless sensor networks (WSNs are used in many applications, namely, environment monitoring, disaster management, industrial automation, and medical electronics. Sensor nodes carry many limitations like low battery life, small memory space, and limited computing capability. To create a wireless sensor network more energy efficient, swarm intelligence technique has been applied to resolve many optimization issues in WSNs. In many existing clustering techniques an artificial bee colony (ABC algorithm is utilized to collect information from the field periodically. Nevertheless, in the event based applications, an ant colony optimization (ACO is a good solution to enhance the network lifespan. In this paper, we combine both algorithms (i.e., ABC and ACO and propose a new hybrid ABCACO algorithm to solve a Nondeterministic Polynomial (NP hard and finite problem of WSNs. ABCACO algorithm is divided into three main parts: (i selection of optimal number of subregions and further subregion parts, (ii cluster head selection using ABC algorithm, and (iii efficient data transmission using ACO algorithm. We use a hierarchical clustering technique for data transmission; the data is transmitted from member nodes to the subcluster heads and then from subcluster heads to the elected cluster heads based on some threshold value. Cluster heads use an ACO algorithm to discover the best route for data transmission to the base station (BS. The proposed approach is very useful in designing the framework for forest fire detection and monitoring. The simulation results show that the ABCACO algorithm enhances the stability period by 60% and also improves the goodput by 31% against LEACH and WSNCABC, respectively.
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
Bouveyron, Charles
2012-01-01
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Let G = (V, E) be a complete undirected graph with vertex set V, edge set E, and edge weights I(e)satisfying the triangle inequality. The vertex set V is partitioned into clusters V1, V2 Vk. The clustered traveling salesman problem (CTSP) seeks to compute the shortest Hamiltonian tour that visits all the vertices, in which the vertices of each cluster are visited consecutively. A two-level genetic algorithm (TLGA) was developed for the problem, which favors neither intra-cluster paths nor inter-cluster paths, thus realized integrated evolutionary optimization for both levels of the CTSP. Results show that the algorithm is more effective than known algorithms. A large-scale traveling salesman problem (TSP) can be converted into a CTSP by clustering so that it can then be solved by the algorithm. Test results demonstrate that the clustering TLGA for large TSPs is more effective and efficient than the classical genetic algorithm.
An efficient clustering algorithm for partitioning Y-short tandem repeats data
Directory of Open Access Journals (Sweden)
Seman Ali
2012-10-01
Full Text Available Abstract Background Y-Short Tandem Repeats (Y-STR data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH, obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91, k-Modes-RVF (0.81, New Fuzzy k-Modes (0.80, k-Modes (0.76, k-Modes-Hybrid 1 (0.76, k-Modes-Hybrid 2 (0.75, Fuzzy k-Modes (0.74, and k-Modes-UAVM (0.70. Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k and considered to be linear.
A REAL-TIME C-V CLUSTERING ALGORITHM FOR WEB-MINING
Institute of Scientific and Technical Information of China (English)
Li Haiying; Zhuang Zhenquan; Li Bin; Wan Ke
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.
Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm
Directory of Open Access Journals (Sweden)
Xia Hanyu
2016-01-01
Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.
A Throughput-Driven Scheduling Algorithm of Differentiated Service for Web Cluster
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental results from the testing tool of WebBenchTM show better performance for Web cluster server with TDSA than that with traditional scheduling algorithms.
Clustering and the Three-Point Function
Jiang, Yunfeng; Kostov, Ivan; Serban, Didina
2016-01-01
We develop analytical methods for computing the structure constant for three heavy operators, starting from the recently proposed hexagon approach. Such a structure constant is a semiclassical object, with the scale set by the inverse length of the operators playing the role of the Planck constant. We reformulate the hexagon expansion in terms of multiple contour integrals and recast it as a sum over clusters generated by the residues of the measure of integration. We test the method on two examples. First, we compute the asymptotic three-point function of heavy fields at any coupling and show the result in the semiclassical limit matches both the string theory computation at strong coupling and the tree-level results obtained before. Second, in the case of one non-BPS and two BPS operators at strong coupling we sum up all wrapping corrections associated with the opposite bridge to the non-trivial operator, or the "bottom" mirror channel. We also give an alternative interpretation of the results in terms of a...
Ternary Tree and Clustering Based Huffman Coding Algorithm
Directory of Open Access Journals (Sweden)
Pushpa R. Suri
2010-09-01
Full Text Available In this study, the focus was on the use of ternary tree over binary tree. Here, a new two pass Algorithm for encoding Huffman ternary tree codes was implemented. In this algorithm we tried to find out the codeword length of the symbol. Here I used the concept of Huffman encoding. Huffman encoding was a two pass problem. Here the first pass was to collect the letter frequencies. You need to use that information to create the Huffman tree. Note that char values range from -128 to 127, so you will need to cast them. I stored the data as unsigned chars to solve this problem, and then the range is 0 to 255. Open the output file and write the frequency table to it. Open the input file, read characters from it, gets the codes, and writes the encoding into the output file. Once a Huffman code has been generated, data may be encoded simply by replacing each symbol with its code. To reduce the memory size and fasten the process of finding the codeword length for a symbol in a Huffman tree, we proposed a memory efficient data structure to represent the codeword length of Huffman ternary tree. In this algorithm we tried to find out the length of the code of the symbols used in the tree.
Hocaoğlu, C; Sanderson, A C
1997-01-01
A novel genetic algorithm (GA) using minimal representation size cluster (MRSC) analysis is designed and implemented for solving multimodal function optimization problems. The problem of multimodal function optimization is framed within a hypothesize-and-test paradigm using minimal representation size (minimal complexity) for species formation and a GA. A multiple-population GA is developed to identify different species. The number of populations, thus the number of different species, is determined by the minimal representation size criterion. Therefore, the proposed algorithm reveals the unknown structure of the multimodal function when a priori knowledge about the function is unknown. The effectiveness of the algorithm is demonstrated on a number of multimodal test functions. The proposed scheme results in a highly parallel algorithm for finding multiple local minima. In this paper, a path-planning algorithm is also developed based on the MRSC_GA algorithm. The algorithm utilizes MRSC_GA for planning paths for mobile robots, piano-mover problems, and N-link manipulators. The MRSC_GA is used for generating multipaths to provide alternative solutions to the path-planning problem. The generation of alternative solutions is especially important for planning paths in dynamic environments. A novel iterative multiresolution path representation is used as a basis for the GA coding. The effectiveness of the algorithm is demonstrated on a number of two-dimensional path-planning problems.
Directory of Open Access Journals (Sweden)
A. Meenakshi
2016-08-01
Full Text Available Resource allocation is the task of convenient resources to different uses. In the context of an resources, entire economy, can be assigned by different means, such as markets or central planning. Cloud computing has become a new age technology that has got huge potentials in enterprises and markets. Clouds can make it possible to access applications and associated data from anywhere. The fundamental motive of the resource allocation is to allot the available resource in the most effective manner. In the initial phase, a representative resource usage distribution for a group of nodes with identical resource usage patterns is evaluated as resource bundle which can be easily employed to locate a group of nodes fulfilling a standard criterion. In the document, an innovative clustering-based resource aggregation viz. the Improved Hierarchal Agglomerative Clustering Algorithm (IHAC is elegantly launched to realize the compact illustration of a set of identically behaving nodes for scalability. In the subsequent phase concerned with energetic resource allocation procedure, the hybrid optimization technique is brilliantly brought in. The novel technique is devised for scheduling functions to cloud resources which duly consider both financial and evaluation expenses. The efficiency of the novel Resource allocation system is assessed by means of several parameters such the reliability, reusability and certain other metrics. The optimal path choice is the consequence of the hybrid optimization approach. The new-fangled technique allocates the available resource based on the optimal path.
National Research Council Canada - National Science Library
T. Velmurugan; T. Santhanam
2010-01-01
.... Clustering algorithms can be applied in many domains. Approach: In this research, the most representative algorithms K-Means and K-Medoids were examined and analyzed based on their basic approach...
KohonAnts: A Self-Organizing Ant Algorithm for Clustering and Pattern Classification
Fernandes, C; Merelo, J J; Ramos, V; Laredo, J L J
2008-01-01
In this paper we introduce a new ant-based method that takes advantage of the cooperative self-organization of Ant Colony Systems to create a naturally inspired clustering and pattern recognition method. The approach considers each data item as an ant, which moves inside a grid changing the cells it goes through, in a fashion similar to Kohonen's Self-Organizing Maps. The resulting algorithm is conceptually more simple, takes less free parameters than other ant-based clustering algorithms, and, after some parameter tuning, yields very good results on some benchmark problems.
Directory of Open Access Journals (Sweden)
Burhan Ergen
2014-01-01
Full Text Available This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT and Magnetic Resonance Imaging (MRI devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.
Algorithm for Multi-laser-target Tracking Based on Clustering Fusion
Institute of Scientific and Technical Information of China (English)
ZHANG Li-qun; LI Yan-jun; ZHANG Ke
2007-01-01
Multi-laser-target tracking is an important subject in the field of signal processing of laser warners. A clustering method is applied to the measurement of laser warner, and the space-time fusion for measurements in the same cluster is accomplished. Real-time tracking of multi-laser-target and real-time picking of multi-laser-signal are introduced using data fusion of the measurements. A prototype device of the algorithm is built up. The results of experiments show that the algorithm is very effective.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.
Segmentation of Mushroom and Cap width Measurement using Modified K-Means Clustering Algorithm
Directory of Open Access Journals (Sweden)
Eser Sert
2014-01-01
Full Text Available Mushroom is one of the commonly consumed foods. Image processing is one of the effective way for examination of visual features and detecting the size of a mushroom. We developed software for segmentation of a mushroom in a picture and also to measure the cap width of the mushroom. K-Means clustering method is used for the process. K-Means is one of the most successful clustering methods. In our study we customized the algorithm to get the best result and tested the algorithm. In the system, at first mushroom picture is filtered, histograms are balanced and after that segmentation is performed. Results provided that customized algorithm performed better segmentation than classical K-Means algorithm. Tests performed on the designed software showed that segmentation on complex background pictures is performed with high accuracy, and 20 mushrooms caps are measured with 2.281 % relative error.
Anticipation versus adaptation in Evolutionary Algorithms: The case of Non-Stationary Clustering
González, A. I.; Graña, M.; D'Anjou, A.; Torrealdea, F. J.
1998-07-01
From the technological point of view is usually more important to ensure the ability to react promptly to changing environmental conditions than to try to forecast them. Evolution Algorithms were proposed initially to drive the adaptation of complex systems to varying or uncertain environments. In the general setting, the adaptive-anticipatory dilemma reduces itself to the placement of the interaction with the environment in the computational schema. Adaptation consists of the estimation of the proper parameters from present data in order to react to a present environment situation. Anticipation consists of the estimation from present data in order to react to a future environment situation. This duality is expressed in the Evolutionary Computation paradigm by the precise location of the consideration of present data in the computation of the individuals fitness function. In this paper we consider several instances of Evolutionary Algorithms applied to precise problem and perform an experiment that test their response as anticipative and adaptive mechanisms. The non stationary problem considered is that of Non Stationary Clustering, more precisely the adaptive Color Quantization of image sequences. The experiment illustrates our ideas and gives some quantitative results that may support the proposition of the Evolutionary Computation paradigm for other tasks that require the interaction with a Non-Stationary environment.
An efficient method of key-frame extraction based on a cluster algorithm.
Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng
2013-12-18
This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.
Robustness of "cut and splice" genetic algorithms in the structural optimization of atomic clusters
Froltsov, V.; Reuter, K.
2009-01-01
We return to the geometry optimization problem of Lennard-Jones clusters to analyze the performance dependence of 'cut and splice' genetic algorithms (GAs) on the employed population size. We generally find that admixing twinning mutation moves leads to an improved robustness of the algorithm efficiency with respect to this a priori unknown technical parameter. The resulting very stable performance of the corresponding mutation + mating GA implementation over a wide range of population sizes...
Critical slowing down of cluster algorithms for Ising models coupled to 2-d gravity
Bowick, Mark; Falcioni, Marco; Harris, Geoffrey; Marinari, Enzo
1994-02-01
We simulate single and multiple Ising models coupled to 2-d gravity using both the Swendsen-Wang and Wolff algorithms to update the spins. We study the integrated autocorrelation time and find that there is considerable critical slowing down, particularly in the magnetization. We argue that this is primarily due to the local nature of the dynamical triangulation algorithm and to the generation of a distribution of baby universes which inhibits cluster growth.
Critical Slowing Down of Cluster Algorithms for Ising Models Coupled to 2-d Gravity
Bowick, M; Harris, G; Marinari, E
1994-01-01
We simulate single and multiple Ising models coupled to 2-d gravity using both the Swendsen-Wang and Wolff algorithms to update the spins. We study the integrated autocorrelation time and find that there is considerable critical slowing down, particularly in the magnetization. We argue that this is primarily due to the local nature of the dynamical triangulation algorithm and to the generation of a distribution of baby universes which inhibits cluster growth.
KtJet: A C++ implementation of the Kt clustering algorithm
J.M. Butterworth; Couchman, J. P.; Cox, B. E.; Waugh, B. M.
2002-01-01
A C++ implementation of the Kt jet algorithm for high energy particle collisions is presented. The time performance of this implementation is comparable to the widely used Fortran implementation. Identical algorithmic functionality is provided, with a clean and intuitive user interface and additional recombination schemes. A short description of the algorithm and examples of its use are given.
Directory of Open Access Journals (Sweden)
Lejiang Guo
2011-05-01
Full Text Available Wireless Sensor Networks (WSN represent a new dimension in the field of network research. The cluster algorithm can significantly reduce the energy consumption of wireless sensor networks and prolong the network lifetime. This paper uses neuron to describe the WSN node and constructs neural network model for WSN. The neural network model includes three aspects: WSN node neuron model, WSN node control model and WSN node connection model. Through learning the framework of cluster algorithm for wireless sensor networks, this paper presents a weighted average of cluster-head selection algorithm based on an improved Genetic Optimization which makes the node weights directly related to the decision-making predictions. The Algorithm consists of two stages: single-parent evolution and population evolution. The initial population is formed in the stage of single-parent evolution by using gene pool, then the algorithm continues to the next further evolution process, finally the best solution will be generated and saved in the population. The simulation results illustrate that the new algorithm has the high convergence speed and good global searching capacity. It is to effectively balance the network energy consumption, improve the network life-cycle, ensure the communication quality and provide a certain theoretical foundation for the applications of the neural networks.
A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
Institute of Scientific and Technical Information of China (English)
Haoxiang XIA; Shuguang WANG; Taketoshi YOSHIDA
2006-01-01
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance.Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
A harmony search algorithm for clustering with feature selection
Directory of Open Access Journals (Sweden)
Carlos Cobos
2010-01-01
Full Text Available En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.
Gieles, M; Bastian, N; Stein, I; Gieles, Mark; Larsen, Soeren; Bastian, Nate; Stein, Ilaan
2005-01-01
We introduce a method to relate a possible truncation of the star cluster mass function at the high mass end to the shape of the cluster luminosity function (LF). We compare the observed LFs of five galaxies containing young star clusters with synthetic cluster population models with varying initial conditions. The LF of the SMC, the LMC and NGC 5236 are characterized by a power-law behavior NdL~L^-a dL, with a mean exponent of = 2.0 +/- 0.2. This can be explained by a cluster population formeda with a constant cluster formation rate, in which the maximum cluster mass per logarithmic age bin is determined by the size-of-sample effect and therefore increases with log(age/yr). The LFs of NGC 6946 and M51 are better described by a double power-law distribution or a Schechter function. When a cluster population has a mass function that is truncated below the limit given by the size-of-sample effect, the total LF shows a bend at the magnitude of the maximum mass, with the age of the oldest cluster in the populati...
Gaur, Pallavi; Chaturvedi, Anoop
2017-07-22
The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Structures of Adatom Clusters on Ag(111) Surface by Genetic Algorithm
Institute of Scientific and Technical Information of China (English)
SUN Zhi-Hua; LIU Qing-Wei; LI Yu-Fen; ZHUANG Jun
2004-01-01
@@ We study the structures of Ag adatom clusters supported on the metal Ag(111) surface using the genetic algorithm (GA). The atomic interactions are modelled by the surface-embedded-atom method. The lowest-energy structures of adatom clusters with sizes n = 3-20 are obtained, in which n = 7, 10, 12, 14, 16, 19 are the magic numbers.Furthermore, we give a series of structures with energies close to the lowest energy (the lower-energy isomers), and the structure features are studied in detail. Except for some magic clusters and small clusters, every configuration of adatom clusters generally has two distinct adsorption ways, so the isomers always appear in pairs.
Distributed Clustering Algorithm to Explore Selection Diversity in Wireless Sensor Networks
Kong, Hyung-Yun; Asaduzzaman, Hyung-Yun
This paper presents a novel cross-layer approach to explore selection diversity for distributed clustering based wireless sensor networks (WSNs) by selecting a proper cluster-head. We develop and analyze an instantaneous channel state information (CSI) based cluster-head selection algorithm for a distributed, dynamic and randomized clustering based WSN. The proposed cluster-head selection scheme is also random and capable to distribute the energy uses among the nodes in the network. We present an analytical approach to evaluate the energy efficiency and system lifetime of our proposal. Analysis shows that the proposed scheme outperforms the performance of additive white Gaussian noise (AWGN) channel under Rayleigh fading environment. This proposal also outperforms the existing cooperative diversity protocols in terms of system lifetime and implementation complexity.
An improved K-means clustering algorithm in agricultural image segmentation
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Directory of Open Access Journals (Sweden)
MOSTAFA BAGHOURI
2014-06-01
Full Text Available Ameliorating the lifetime in heterogeneous wireless sensor network is an important task because the sensor nodes are limited in the resource energy. The best way to improve a WSN lifetime is the clustering based algorithms in which each cluster is managed by a leader called Cluster Head. Each other node must communicate with this CH to send the data sensing. The nearest base station nodes must also send their data to their leaders, this causes a loss of energy. In this paper, we propose a new approach to ameliorate a threshold distributed energy efficient clustering protocol for heterogeneous wireless sensor networks by excluding closest nodes to the base station in the clustering process. We show by simulation in MATLAB that the proposed approach increases obviously the number of the received packet messages and prolongs the lifetime of the network compared to TDEEC protocol.
Nakayama, Hiromasa
2006-01-01
We give an algorithm to compute the local $b$ function. In this algorithm, we use the Mora division algorithm in the ring of differential operators and an approximate division algorithm in the ring of differential operators with power series coefficient.
An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information
Directory of Open Access Journals (Sweden)
Ao Li
2009-04-01
Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.
Newton Algorithms for Analytic Rotation: An Implicit Function Approach
Boik, Robert J.
2008-01-01
In this paper implicit function-based parameterizations for orthogonal and oblique rotation matrices are proposed. The parameterizations are used to construct Newton algorithms for minimizing differentiable rotation criteria applied to "m" factors and "p" variables. The speed of the new algorithms is compared to that of existing algorithms and to…
Newton Algorithms for Analytic Rotation: An Implicit Function Approach
Boik, Robert J.
2008-01-01
In this paper implicit function-based parameterizations for orthogonal and oblique rotation matrices are proposed. The parameterizations are used to construct Newton algorithms for minimizing differentiable rotation criteria applied to "m" factors and "p" variables. The speed of the new algorithms is compared to that of existing algorithms and to…
Image Segmentation Algorithm Based on Spectral Clustering Algorithm%谱聚类图像分割算法研究
Institute of Scientific and Technical Information of China (English)
张权; 胡玉兰
2012-01-01
针对谱聚类算法对图像分割效果差强人意的特点,研究了一种改进的Nystr(o)m算法进行谱聚类图像分割,使谱聚类算法应用于图像分割的效果有所改善.该算法首先对图像进行预处理,变换图像的分布数据空间,再分别计算对选定样本空间的数据间以及样本与其他空间的数据间的距离矩阵,并转化为相似矩阵；然后对相似矩阵正交化并且特征分解,进行K-Means聚类；最后将聚类结果进行后期处理.通过实验验证了该算法的有效性.%Spectral clustering algorithm to image segmentation was not perfect. An algorithm is proposed for spectral clustering image segmentation, which makes the effect of image segmentation better. Firstly, the image was pre-processed, transformed the distribution of the image data space, and calculated the distance matrix between the data of the selected sample space as well as samples and other space. It is transformed into a similarity matrix,what is more,the similarity matrix is made by orthogonal . The characteristics is decomposing by K-Means clustering; Finally, it took some steps for clustering results to be processed . Effectiveness of the algorithm is verified by experiment reasults.
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.
A spectral scheme for Kohn–Sham density functional theory of clusters
Energy Technology Data Exchange (ETDEWEB)
Banerjee, Amartya S., E-mail: baner041@umn.edu; Elliott, Ryan S., E-mail: relliott@umn.edu; James, Richard D., E-mail: james@umn.edu
2015-04-15
Starting from the observation that one of the most successful methods for solving the Kohn–Sham equations for periodic systems – the plane-wave method – is a spectral method based on eigenfunction expansion, we formulate a spectral method designed towards solving the Kohn–Sham equations for clusters. This allows for efficient calculation of the electronic structure of clusters (and molecules) with high accuracy and systematic convergence properties without the need for any artificial periodicity. The basis functions in this method form a complete orthonormal set and are expressible in terms of spherical harmonics and spherical Bessel functions. Computation of the occupied eigenstates of the discretized Kohn–Sham Hamiltonian is carried out using a combination of preconditioned block eigensolvers and Chebyshev polynomial filter accelerated subspace iterations. Several algorithmic and computational aspects of the method, including computation of the electrostatics terms and parallelization are discussed. We have implemented these methods and algorithms into an efficient and reliable package called ClusterES (Cluster Electronic Structure). A variety of benchmark calculations employing local and non-local pseudopotentials are carried out using our package and the results are compared to the literature. Convergence properties of the basis set are discussed through numerical examples. Computations involving large systems that contain thousands of electrons are demonstrated to highlight the efficacy of our methodology. The use of our method to study clusters with arbitrary point group symmetries is briefly discussed.
Cluster algorithm for two-dimensional U(1) lattice gauge theory
Sinclair, R.
1992-03-01
We use gauge fixing to rewrite the two-dimensional U(1) pure gauge model with Wilson action and periodic boundary conditions as a nonfrustrated XY model on a closed chain. The Wolff single-cluster algorithm is then applied, eliminating critical slowing down of topological modes and Polyakov loops.
New and Old Jet Clustering Algorithms for Electron-Positron Events
Moretti, S; Sjöstrand, Torbjörn; Moretti, Stefano; Lönnblad, Leif; Sjöstrand, Torbjörn
1998-01-01
Over the years, many jet clustering algorithms have been proposed for the analysis of hadronic final states in $e^+e^-$ annihilations. These have somewhat different emphasis and are therefore more or less suited for various applications. We here review some of the most used and compare them from a theoretical and experimental point of view.
A genetic algorithm using hyper-quadtrees for low-dimensional K-means clustering.
Laszlo, Michael; Mukherjee, Sumitra
2006-04-01
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
Wu, Tin-Yu; Chang, Tse; Chu, Teng-Hao
2017-02-01
Many data mining adopts the form of Artificial Neural Network (ANN) to solve many problems, many problems will be involved in the process of training Artificial Neural Network, such as the number of samples with volume label, the time and performance of training, the number of hidden layers and Transfer function, if the compared data results are not expected, it cannot be known clearly that which dimension causes the deviation, the main reason is that Artificial Neural Network trains compared results through the form of modifying weight, and it is not a kind of training to improve the original algorithm for the extraction algorithm of image, but tend to obtain correct value aimed at the result plus the weigh; in terms of these problems, this paper will mainly put forward a method to assist in the image data analysis of Artificial Neural Network; normally, a parameter will be set as the value to extract feature vector during processing the image, which will be considered by us as weight, the experiment will use the value extracted from feature point of Speeded Up Robust Features (SURF) Image as the basis for training, SURF itself can extract different feature points according to extracted values, we will make initial semi-supervised clustering according to these values, and use Modified K - on his Neighbors (MFKNN) as training and classification, the matching mode of unknown images is not one-to-one complete comparison, but only compare group Centroid, its main purpose is to save its efficiency and speed up, and its retrieved data results will be observed and analyzed eventually; the method is mainly to make clustering and classification with the use of the nature of image feature point to give values to groups with high error rate to produce new feature points and put them into Input Layer of Artificial Neural Network for training, and finally comparative analysis is made with Back-Propagation Neural Network (BPN) of Genetic Algorithm-Artificial Neural Network
An effective trust-based recommendation method using a novel graph clustering algorithm
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
A cluster finding algorithm based on the multi-band identification of red-sequence galaxies
Oguri, Masamune
2014-01-01
We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red-sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ~11960 deg^2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71743 clusters in the redshift range 0.1
Gkaitatzis, Stamatios; The ATLAS collaboration
2016-01-01
In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100 µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avai...
Gkaitatzis, Stamatios; The ATLAS collaboration; Annovi, Alberto; Kordas, Kostantinos
2016-01-01
In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avail...
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
Performance evaluation of simple linear iterative clustering algorithm on medical image processing.
Cong, Jinyu; Wei, Benzheng; Yin, Yilong; Xi, Xiaoming; Zheng, Yuanjie
2014-01-01
Simple Linear Iterative Clustering (SLIC) algorithm is increasingly applied to different kinds of image processing because of its excellent perceptually meaningful characteristics. In order to better meet the needs of medical image processing and provide technical reference for SLIC on the application of medical image segmentation, two indicators of boundary accuracy and superpixel uniformity are introduced with other indicators to systematically analyze the performance of SLIC algorithm, compared with Normalized cuts and Turbopixels algorithm. The extensive experimental results show that SLIC is faster and less sensitive to the image type and the setting superpixel number than other similar algorithms such as Turbopixels and Normalized cuts algorithms. And it also has a great benefit to the boundary recall, the robustness of fuzzy boundary, the setting superpixel size and the segmentation performance on medical image segmentation.
Unsupervised unstained cell detection by SIFT keypoint clustering and self-labeling algorithm.
Muallal, Firas; Schöll, Simon; Sommerfeldt, Björn; Maier, Andreas; Steidl, Stefan; Buchholz, Rainer; Hornegger, Joachim
2014-01-01
We propose a novel unstained cell detection algorithm based on unsupervised learning. The algorithm utilizes the scale invariant feature transform (SIFT), a self-labeling algorithm, and two clustering steps in order to achieve high performance in terms of time and detection accuracy. Unstained cell imaging is dominated by phase contrast and bright field microscopy. Therefore, the algorithm was assessed on images acquired using these two modalities. Five cell lines having in total 37 images and 7250 cells were considered for the evaluation: CHO, L929, Sf21, HeLa, and Bovine cells. The obtained F-measures were between 85.1 and 89.5. Compared to the state-of-the-art, the algorithm achieves very close F-measure to the supervised approaches in much less time.
Tramacere, A; Dubath, P; Kneib, J -P; Courbin, F
2016-01-01
We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional ones related to segments of spiral arms. We use two different supervised ensemble classification algorithms, Random Forest,...
Institute of Scientific and Technical Information of China (English)
Xiang Gao; Yintang Yang; Duan Zhou
2010-01-01
An effective algorithm based on signal coverage of effective communication and local energy-consumption saving strategy is proposed for the application in wireless sensor networks.This algorithm consists of two sub-algorithms.One is the multi-hop partition subspaces clustering algorithm for ensuring local energybalanced consumption ascribed to the deployment from another algorithm of distributed locating deployment based on efficient communication coverage probability(DLD-ECCP).DLD-ECCP makes use of the characteristics of Markov chain and probabilistic optimization to obtain the optimum topology and number of sensor nodes.Through simulation,the relative data demonstrate the advantages of the proposed approaches on saving hardware resources and energy consumption of networks.
Minimum mutual information based level set clustering algorithm for fast MRI tissue segmentation.
Dai, Shuanglu; Man, Hong; Zhan, Shu
2015-01-01
Accurate and accelerated MRI tissue recognition is a crucial preprocessing for real-time 3d tissue modeling and medical diagnosis. This paper proposed an information de-correlated clustering algorithm implemented by variational level set method for fast tissue segmentation. The key idea is to design a local correlation term between original image and piecewise constant into the variational framework. The minimized correlation will then lead to de-correlated piecewise regions. Firstly, by introducing a continuous bounded variational domain describing the image, a probabilistic image restoration model is assumed to modify the distortion. Secondly, regional mutual information is introduced to measure the correlation between piecewise regions and original images. As a de-correlated description of the image, piecewise constants are finally solved by numerical approximation and level set evolution. The converged piecewise constants automatically clusters image domain into discriminative regions. The segmentation results show that our algorithm performs well in terms of time consuming, accuracy, convergence and clustering capability.
MixSim : An R Package for Simulating Data to Study Performance of Clustering Algorithms
Directory of Open Access Journals (Sweden)
Volodymyr Melnykov
2012-11-01
Full Text Available The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Using genetic algorithm based fuzzy adaptive resonance theory for clustering analysis
Institute of Scientific and Technical Information of China (English)
LIU Bo; WANG Yong; WANG Hong-jian
2006-01-01
In the clustering applications field, fuzzy adaptive resonance theory system has been widely applied. But, three parameters of fuzzy adaptive resonance theory need to be adjusted manually for obtaining better clustering. It needs much time to test and does not assure a best result. Genetic algorithm is an optimal mathematical search technique based on the principles of natural selection and genetic recombination. So, to make the fuzzy adaptive resonance theory parameters choosing process automation, an approach incorporating genetic algorithm and fuzzy adaptive resonance theory neural network has been applied. Then, the best clustering result can be obtained.Through experiment, it can be proved that the most appropriate parameters of fuzzy adaptive resonance theory can be gained effectively by this approach.
A REAL—TIME C—V CLUSTERING ALGORITHM FOR WEB—MINING
Institute of Scientific and Technical Information of China (English)
LiHaiying; ZuangZhenquan; 等
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site.The algo-fithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value[0,1] input and self-definition vigilance parameter to design clustering-architecture.Vector Degree of Matching(VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic.Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99.This non-linear relation between vigilance parameter and classification upper limit helps mining out representa-tive classifications from net-users according to the actural web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and reapidly.
Institute of Scientific and Technical Information of China (English)
Mohammed A.M. Ibrahim; Lu Xinda; M. SaifMokbel
2005-01-01
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm,asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a loadbalancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree's shape. For the( ARR ) algorithm only minimal interpreeessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
Directory of Open Access Journals (Sweden)
Tcha Hong
2008-01-01
Full Text Available Abstract Background The previous studies of genome-wide expression patterns show that a certain percentage of genes are cell cycle regulated. The expression data has been analyzed in a number of different ways to identify cell cycle dependent genes. In this study, we pose the hypothesis that cell cycle dependent genes are considered as oscillating systems with a rhythm, i.e. systems producing response signals with period and frequency. Therefore, we are motivated to apply the theory of multivariate phase synchronization for clustering cell cycle specific genome-wide expression data. Results We propose the strategy to find groups of genes according to the specific biological process by analyzing cell cycle specific gene expression data. To evaluate the propose method, we use the modified Kuramoto model, which is a phase governing equation that provides the long-term dynamics of globally coupled oscillators. With this equation, we simulate two groups of expression signals, and the simulated signals from each group shares their own common rhythm. Then, the simulated expression data are mixed with randomly generated expression data to be used as input data set to the algorithm. Using these simulated expression data, it is shown that the algorithm is able to identify expression signals that are involved in the same oscillating process. We also evaluate the method with yeast cell cycle expression data. It is shown that the output clusters by the proposed algorithm include genes, which are closely associated with each other by sharing significant Gene Ontology terms of biological process and/or having relatively many known biological interactions. Therefore, the evaluation analysis indicates that the method is able to identify expression signals according to the specific biological process. Our evaluation analysis also indicates that some portion of output by the proposed algorithm is not obtainable by the traditional clustering algorithm with
Determination of Activation Functions in A Feedforward Neural Network by using Genetic Algorithm
Directory of Open Access Journals (Sweden)
Oğuz ÜSTÜN
2009-03-01
Full Text Available In this study, activation functions of all layers of the multilayered feedforward neural network have been determined by using genetic algorithm. The main criteria that show the efficiency of the neural network is to approximate to the desired output with the same number nodes and connection weights. One of the important parameter to determine this performance is to choose a proper activation function. In the classical neural network designing, a network is designed by choosing one of the generally known activation function. In the presented study, a table has been generated for the activation functions. The ideal activation function for each node has been chosen from this table by using the genetic algorithm. Two dimensional regression problem clusters has been used to compare the performance of the classical static neural network and the genetic algorithm based neural network. Test results reveal that the proposed method has a high level approximation capacity.
AN EFFICIENT UE CLUSTER HEAD SELECTION ALGORITHM IN WIRELESS SENSOR NETWORKS AND CELLULAR NETWORKS
Institute of Scientific and Technical Information of China (English)
Shan Lianhai; Ouyang Yuling; Yuan Zhi; Fang Weidong; Hu Honglin
2013-01-01
Wireless Sensor Networks (WSNs) have been applied in many different areas.Energy etficient algorithms and protocols have become one of the most challenging issues for WSN.Many researchers focused on developing energy efficient clustering algorithms for WSN,but less research has been concerned in the mobile User Equipment (UE) acting as a Cluster Head (CH) for data transmission between cellular networks and WSNs.In this paper,we propose a cellular-assisted UE CH selection algorithm for the WSN,which considers several parameters to choose the optimal UE gateway CH.We analyze the energy cost of data transmission from a sensor node to the next node or gateway and calculate the whole system energy cost for a WSN.Simulation results show that better system performance,in terms of system energy cost and WSNs life time,can be achieved by using interactive optimization with cellular networks.
A Novel Image Fusion Algorithm for Visible and PMMW Images based on Clustering and NSCT
Directory of Open Access Journals (Sweden)
Xiong Jintao
2016-01-01
Full Text Available Aiming at the fusion of visible and Passive Millimeter Wave (PMMW images, a novel algorithm based on clustering and NSCT (Nonsubsampled Contourlet Transform is proposed. It takes advantages of the particular ability of PMMW image in presenting metal target and uses the clustering algorithm for PMMW image to extract the potential target regions. In the process of fusion, NSCT is applied to both input images, and then the decomposition coefficients on different scale are combined using different rules. At last, the fusion image is obtained by taking the inverse NSCT of the fusion coefficients. Some methodologies are used to evaluate the fusion results. Experiments demonstrate the superiority of the proposed algorithm for metal target detection compared to wavelet transform and Laplace transform.
Risk analysis of dam based on artificial bee colony algorithm with fuzzy c-means clustering
Energy Technology Data Exchange (ETDEWEB)
Li, Haojin; Li, Junjie; Kang, Fei
2011-05-15
Risk analysis is a method which has been incorporated into infrastructure engineering. Fuzzy c-means clustering (FCM) is a simple and fast method utilized most of the time, but it can induce errors as it is sensitive to initialization. The aim of this paper was to propose a new method for risk analysis using an artificial bee colony algorithm (ABC) with FCM. This new technique is first explained and then applied on three experiments. Results demonstrated that the combination of artificial bee colony algorithm fuzzy c-means clustering (ABCFCM) is overcoming the FCM issue since it is not initialization sensitive and experiments showed that this algorithm is more accurate and than FCM. This paper provides a new tool for risk analysis which can be used for risk prioritizing and reinforcing dangerous dams in a more scientific way.
Usage of Clustering Algorithm to Segment Image into Simply Connected Domains
Directory of Open Access Journals (Sweden)
S. V. Belim
2015-01-01
Full Text Available The article suggests a method of image segmentation into simply connected domains based on color. Pixels from an original image are represented as points in five-dimensional space which includes three color and two spatial coordinates. The points are normalized in order to eliminate distinguished characteristics. The set of points is compared with a weighted complete graph. The points of five-dimensional space are vertexes in the graph. Euclidian distance between the points is used as weights of the edges in the graph. To solve the task of clustering, a minimum spanning tree of the graph is built. For clustering, the tree is separated into sub-trees by removing some edges. Each sub-tree is a simply connected domain on the original image. In order to improve algorithm speed and reduce memory usage a greedy algorithm is used to build this minimum spanning tree for the graph. Edges to be removed are searched on the graph representing the length of an added edge versus a sequence number of its adding to the tree in the greedy algorithm. The desired edges are detected as maximums on the graphic. This search is based on assumption that transition to an adjacent cluster leads to connection of longer edge in comparison with edges within a cluster. Segmentation into clusters is iterative. At each step the bigger clusters are divided into smaller ones. It means that hierarchy of clusters can be built. A computer experiment was carried out using different images.The suggested method has no disadvantages of the most common method of k-means and allows dividing domains with different colors but the same intensity. Therewith there is no need to specify a number of clusters. Instead, it is necessary to choose a segmentation depth then a number of clusters will be automatically defined. The suggested method has no disadvantages of detection of image edges either. It is sufficient to find one point of image edge to separate two domains.A distinctive feature of
Energy Technology Data Exchange (ETDEWEB)
Dong, Feng; Pierpaoli, Elena; Gunn, James E.; Wechsler, Risa H.
2007-10-29
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is {approx} 85% complete and over 90% pure for clusters with masses above 1.0 x 10{sup 14}h{sup -1} M and redshifts up to z = 0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity of {Delta} = 200, we find the derived cluster richness {Lambda}{sub 200} a roughly linear indicator of its virial mass M{sub 200}, which well recovers the relation between total luminosity and cluster mass of the input simulation.
Wang, Z O; Zhu, T
2000-01-01
This paper presents an efficient recursive learning algorithm for improving generalization performance of radial basis function (RBF) neural networks. The approach combines the rival penalized competitive learning (PRCL) [Xu, L., Kizyzak, A. & Oja, E. (1993). Rival penalized competitive learning for clustering analysis, RBF net and curve detection, IEEE Transactions on Neural Networks, 4, 636-649] and the regularized least squares (RLS) to provide an efficient and powerful procedure for constructing a minimal RBF network that generalizes very well. The RPCL selects the number of hidden units of network and adjusts centers, while the RLS constructs the parsimonious network and estimates the connection weights. In the RLS we derived a simple recursive algorithm, which needs no matrix calculation, and so largely reduces the computational cost. This combined algorithm significantly enhances the generalization performance and the real-time capability of the RBF networks. Simulation results of three different problems demonstrate much better generalization performance of the present algorithm over other existing similar algorithms.
Improved particle clustering-like algorithm%基于多样性反馈的粒子群聚类算法
Institute of Scientific and Technical Information of China (English)
时红军
2012-01-01
The traditional particle clustering algorithm is easy to fall into local optimum class,the clustering accuracy is not high. To solve these problems, an improved type of ion clustering algorithm based on a new inertia weight function and the introduction of the beta distribution based on diversity of feedback to ensure the diversity of the population variance. Test results on Iris data show that: the clustering accuracy of the improved clustering-like algorithm is better than conventional particle clustering-like algorithm under the same conditions of particle.%传统粒子群聚类算法容易陷入局部最优,聚类的准确度不高.针对这些问题,提出一种改进的离子群聚类算法,该算法基于一种新的惯性权重函数并引入基于多样性反馈的beta分布变异来保证种群的多样性.对Iris数据的测试结果表明,在相同的条件下,改进的粒子群聚类算法在聚类的准确度方面优于传统的粒子群聚类算法.
Directory of Open Access Journals (Sweden)
Татьяна Борисовна Шатовская
2015-03-01
Full Text Available In this work results of modified Chameleon algorithm are discussed. Hierarchical multilevel algorithms consist of several stages: building the graph, coarsening, partitioning, recovering. Exploring of clustering quality for different data sets with different combinations of algorithms on different stages of the algorithm is the main aim of the article. And also aim is improving the construction phase through the optimization algorithm of choice k in the building the graph k-nearest neighbors
Komura, Yukihiro; Okabe, Yutaka
2016-03-01
We present new versions of sample CUDA programs for the GPU computing of the Swendsen-Wang multi-cluster spin flip algorithm. In this update, we add the method of GPU-based cluster-labeling algorithm without the use of conventional iteration (Komura, 2015) to those programs. For high-precision calculations, we also add a random-number generator in the cuRAND library. Moreover, we fix several bugs and remove the extra usage of shared memory in the kernel functions.
Nuclear clustering in the energy density functional approach
Energy Technology Data Exchange (ETDEWEB)
Ebran, J.-P., E-mail: jean-paul.ebran@cea.fr [CEA,DAM,DIF, F-91297 Arpajon (France); Khan, E. [Institut de Physique Nucléaire, Université Paris-Sud CEA, IN2P3 CNRS, F-91406 Orsay Cedex (France); Nikšić, T.; Vretenar, D. [Physics Department, Faculty of Science, University of Zagreb, 10000 Zagreb (Croatia)
2015-10-15
Nuclear Energy Density Functionals (EDFs) are a microscopic tool of choice extensively used over the whole chart to successfully describe the properties of atomic nuclei ensuing from their quantum liquid nature. In the last decade, they also have proved their ability to deal with the cluster phenomenon, shedding a new light on its fundamental understanding by treating on an equal footing both quantum liquid and cluster aspects of nuclei. Such a unified microscopic description based on nucleonic degrees of freedom enables to tackle the question pertaining to the origin of the cluster phenomenon and emphasizes intrinsic mechanisms leading to the emergence of clusters in nuclei.
A fast quantum algorithm for the affine Boolean function identification
Younes, Ahmed
2015-02-01
Bernstein-Vazirani algorithm (the one-query algorithm) can identify a completely specified linear Boolean function using a single query to the oracle with certainty. The first aim of the paper is to show that if the provided Boolean function is affine, then one more query to the oracle (the two-query algorithm) is required to identify the affinity of the function with certainty. The second aim of the paper is to show that if the provided Boolean function is incompletely defined, then the one-query and the two-query algorithms can be used as bounded-error quantum polynomial algorithms to identify certain classes of incompletely defined linear and affine Boolean functions respectively with probability of success at least 2/3.
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Directory of Open Access Journals (Sweden)
Xiumin Li
Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.
Scalable fault tolerant algorithms for linear-scaling coupled-cluster electronic structure methods.
Energy Technology Data Exchange (ETDEWEB)
Leininger, Matthew L.; Nielsen, Ida Marie B.; Janssen, Curtis L.
2004-10-01
By means of coupled-cluster theory, molecular properties can be computed with an accuracy often exceeding that of experiment. The high-degree polynomial scaling of the coupled-cluster method, however, remains a major obstacle in the accurate theoretical treatment of mainstream chemical problems, despite tremendous progress in computer architectures. Although it has long been recognized that this super-linear scaling is non-physical, the development of efficient reduced-scaling algorithms for massively parallel computers has not been realized. We here present a locally correlated, reduced-scaling, massively parallel coupled-cluster algorithm. A sparse data representation for handling distributed, sparse multidimensional arrays has been implemented along with a set of generalized contraction routines capable of handling such arrays. The parallel implementation entails a coarse-grained parallelization, reducing interprocessor communication and distributing the largest data arrays but replicating as many arrays as possible without introducing memory bottlenecks. The performance of the algorithm is illustrated by several series of runs for glycine chains using a Linux cluster with an InfiniBand interconnect.
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks
Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong
2017-01-01
Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes’ energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs’ election, we take nodes’ energies, nodes’ degree and neighbor nodes’ residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks. PMID:28671641
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.
Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong
2017-07-03
Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.
Application of K-Means Algorithm for Cluster Analysis on Poverty of Provinces in Indonesia
Directory of Open Access Journals (Sweden)
Albert Verasius Dian Sano
2016-06-01
Full Text Available The objective of this study is to apply cluster analysis or also known as clustering on poverty data of provinces all over Indonesia.The problem is that the decision makers such as central government, local government and non-government organizations, which involve in poverty problems, need a tool to support decision-making process related to social welfare problems. The method used in the cluster analysis is k-means algorithm. The data used in this study were drawn from Badan Pusat Statistik (BPS or Central Bureau of Statistics on 2014.Cluster analysis in this study took characteristics of data such as absolute poverty of each province, relative number or percentage of poverty of each province, and the level of depth index poverty of each province in Indonesia. Results of cluster analysis in this study were presented in the form of grouping of clusters' members visually. Cluster analysis in the study could be used to identify more quickly and efficiently on poverty chart of all provinces all over Indonesia. The results of such identification can be used by policy makers who have interests of eradicating the problems associated with poverty and welfare distribution in Indonesia, ranging from government organizations, non-governmental organizations, and also private organizations.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
Density-based cluster algorithms for the identification of core sets
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
An adaptive enhancement algorithm for infrared video based on modified k-means clustering
Zhang, Linze; Wang, Jingqi; Wu, Wen
2016-09-01
In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.
A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas
Institute of Scientific and Technical Information of China (English)
Carlo Batini; Paola Bonizzoni; Marco Comerio; Riccardo Dondi; Yuri Pirola; Francesco Salandra
2015-01-01
When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an eﬃcient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t. the ones manually defined by an expert.
基于聚类的NSGA-II算法%Non-dominated Sorting Genetic Algorithm II Based on Clustering
Institute of Scientific and Technical Information of China (English)
李志强; 蔺想红
2013-01-01
According to the uneven distribution of population convergence and poor performance in global search of Non-dominated Sorting Genetic Algorithm II(NSGA-II), a multi-objective evolutionary algorithm, called K-means clustering non-dominated sorting genetic algorithm II(KMCNSGAII) is proposed with combining the theory and the existing algorithm. The KMCNSGAII uses K-means clustering technology and at the same time clusters both all the objective functions and individuals respectively. Then the learning and improvement method is used with respect to individuals after clustering. The KMCNSGAII algorithm is applied to several classical unconstrained and constrained test functions. Experimental results demonstrate that the KMCNSGAII achieves good results with performance evaluation about convergence indicator and diversity indicator, in convergence and diversity of population both are improved significantly compared with NSGA-II.%采用精英策略的非支配排序遗传算法(NSGA-II)种群收敛分布不均匀，全局搜索能力较弱。针对该问题，基于现有的算法，提出一种基于聚类学习机制的多目标进化算法KMCNSGA-II。利用K均值聚类对目标函数和个体分别进行聚类，对聚类后的个体进行局部学习，以提高适应度。将该算法应用于经典的多目标约束和非约束测试函数中，通过收敛性指标世代距离和多样性指标∆进行性能评价。实验结果表明，与NSGA-II算法相比，该算法在算法收敛性和种群多样性保持方面均有明显提高。
Lelu, Alain; Cuxac, Pascal
2008-01-01
We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.
K-Means Re-Clustering-Algorithmic Options with Quantifiable Performance Comparisons
Energy Technology Data Exchange (ETDEWEB)
Meyer, A W; Paglieroni, D; Asteneh, C
2002-12-17
This paper presents various architectural options for implementing a K-Means Re-Clustering algorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.
An Airborne Radar Clutter Tracking Algorithm Based on Multifractal and Fuzzy C-Mean Cluster
Institute of Scientific and Technical Information of China (English)
Wei Zhang; Sheng-Lin Yu; Gong Zhang
2007-01-01
For an airborne lookdown radar, clutter power often changes dynamically about 80 dB with wide distributions as the platform moves. Therefore, clutter tracking techniques are required to guide the selection of const false alarm rate (CFAR) schemes. In this work, clutter tracking is done in image domain and an algorithm combining multifractal and fuzzy C-mean (FCM) cluster is proposed. The clutter with large dynamic distributions in power density is converted to steady distributions of multifractal exponents by the multifractal transformation with the optimum moment. Then, later, the main lobe and side lobe are tracked from the multifractal exponents by FCM clustering method.
Pluchino, Alessandro; Latora, Vito
2008-01-01
We have recently introduced an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Hopfield-K-Means clustering algorithm: A proposal for the segmentation of electricity customers
Energy Technology Data Exchange (ETDEWEB)
Lopez, Jose J.; Aguado, Jose A.; Martin, F.; Munoz, F.; Rodriguez, A.; Ruiz, Jose E. [Department of Electrical Engineering, University of Malaga, C/ Dr. Ortiz Ramos, sn., Escuela de Ingenierias, 29071 Malaga (Spain)
2011-02-15
Customer classification aims at providing electric utilities with a volume of information to enable them to establish different types of tariffs. Several methods have been used to segment electricity customers, including, among others, the hierarchical clustering, Modified Follow the Leader and K-Means methods. These, however, entail problems with the pre-allocation of the number of clusters (Follow the Leader), randomness of the solution (K-Means) and improvement of the solution obtained (hierarchical algorithm). Another segmentation method used is Hopfield's autonomous recurrent neural network, although the solution obtained only guarantees that it is a local minimum. In this paper, we present the Hopfield-K-Means algorithm in order to overcome these limitations. This approach eliminates the randomness of the initial solution provided by K-Means based algorithms and it moves closer to the global optimun. The proposed algorithm is also compared against other customer segmentation and characterization techniques, on the basis of relative validation indexes. Finally, the results obtained by this algorithm with a set of 230 electricity customers (residential, industrial and administrative) are presented. (author)
Star Cluster Luminosity Functions and Cluster Formation Efficiencies in LEGUS Dwarf Galaxies
Cook, David O.; Lee, Janice C.; Adamo, Angela; Kim, Hwiyun; Ryon, Jenna E.; LEGUS Team
2017-01-01
We present preliminary results of star cluster luminosity functions (LFs) and cluster formation efficiencies (Γ) in the LEGUS dwarf galaxy sub-sample. We have used a combination of automated and visual identification techniques to allow us to construct a more complete sample of clusters in these low-mass, low-SFR environments compared to previous studies of dwarf galaxies. Cluster properties are derived from fitting UV and optical (NUV-I) HST photometry to both deterministic and stochastic single-aged stellar populations models. We compare the cluster formation efficiencies and LF slopes to those of previous studies in both dwarf and massive spiral galaxy environments. Recent studies have found that both the LF slope and Γ form trends with galaxy environment. Our LF slope and Γ measurements in the LEGUS dwarfs will allow us to test these trends in the extreme, low-SFR regime and provide a better understanding of the star formation process.
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
Guo, Yue; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Ou-Yang, Qun; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao
2015-01-01
Considering the aging effects of existing Inner Drift Chamber (IDC) of BES\\uppercase\\expandafter{\\romannumeral3}, a GEM based inner tracker is proposed to be designed and constructed as an upgrade candidate for IDC. This paper introduces a full simulation package of CGEM-IT with a simplified digitization model, describes the development of the softwares for cluster reconstruction and track fitting algorithm based on Kalman filter method for CGEM-IT. Preliminary results from the reconstruction algorithms are obtained using a Monte Carlo sample of single muon events in CGEM-IT.
A Novel and Robust Evolution Algorithm for Optimizing Complicated Functions
Gao, Yifeng; Zhao, Ge
2011-01-01
In this paper, a novel mutation operator of differential evolution algorithm is proposed. A new algorithm called divergence differential evolution algorithm (DDEA) is developed by combining the new mutation operator with divergence operator and assimilation operator (divergence operator divides population, and, assimilation operator combines population), which can detect multiple solutions and robustness in noisy environment. The new algorithm is applied to optimize Michalewicz Function and to track changing of rain-induced-attenuation process. The results based on DDEA are compared with those based on Differential Evolution Algorithm (DEA). It shows that DDEA algorithm gets better results than DEA does in the same premise. The new algorithm is significant for optimizing and tracking the characteristics of MIMO (Multiple Input Multiple Output) channel at millimeter waves.