The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units
Tavares Delgado, Ademar; The ATLAS collaboration
2016-01-01
The ATLAS Trigger Algorithms for General Purpose Graphics Processor Units Type: Talk Abstract: We present the ATLAS Trigger algorithms developed to exploit General Purpose Graphics Processor Units. ATLAS is a particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system has two levels, hardware-based Level 1 and the High Level Trigger implemented in software running on a farm of commodity CPU. Performing the trigger event selection within the available farm resources presents a significant challenge that will increase future LHC upgrades. are being evaluated as a potential solution for trigger algorithms acceleration. Key factors determining the potential benefit of this new technology are the relative execution speedup, the number of GPUs required and the relative financial cost of the selected GPU. We have developed a trigger demonstrator which includes algorithms for reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Cal...
Using general-purpose compression algorithms for music analysis
DEFF Research Database (Denmark)
Louboutin, Corentin; Meredith, David
2016-01-01
General-purpose compression algorithms encode files as dictionaries of substrings with the positions of these strings’ occurrences. We hypothesized that such algorithms could be used for pattern discovery in music. We compared LZ77, LZ78, Burrows–Wheeler and COSIATEC on classifying folk song...... melodies. A novel method was used, combining multiple viewpoints, the k-nearest-neighbour algorithm and a novel distance metric, corpus compression distance. Using single viewpoints, COSIATEC outperformed the general-purpose compressors, with a classification success rate of 85% on this task. However...... in the input data, COSIATEC outperformed LZ77 with a mean F1 score of 0.123, compared with 0.053 for LZ77. However, when the music was processed a voice at a time, the F1 score for LZ77 more than doubled to 0.124. We also discovered a significant correlation between compression factor and F1 score for all...
GENETIC ALGORITHM ON GENERAL PURPOSE GRAPHICS PROCESSING UNIT: PARALLELISM REVIEW
Directory of Open Access Journals (Sweden)
A.J. Umbarkar
2013-01-01
Full Text Available Genetic Algorithm (GA is effective and robust method for solving many optimization problems. However, it may take more runs (iterations and time to get optimal solution. The execution time to find the optimal solution also depends upon the niching-technique applied to evolving population. This paper provides the information about how various authors, researchers, scientists have implemented GA on GPGPU (General purpose Graphics Processing Units with and without parallelism. Many problems have been solved on GPGPU using GA. GA is easy to parallelize because of its SIMD nature and therefore can be implemented well on GPGPU. Thus, speedup can definitely be achieved if bottleneck in GAs are identified and implemented effectively on GPGPU. Paper gives review of various applications solved using GAs on GPGPU with the future scope in the area of optimization.
High-Speed General Purpose Genetic Algorithm Processor.
Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah
2016-07-01
In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.
Building a General Purpose Beowulf Cluster for Astrophysics Research
Phelps, M. W. L.
2005-12-01
The challenges of designing and deploying a high performance, Linux based, Beowulf cluster for use by many departments and projects are covered. Considerations include hardware, infrastructure (space, cooling, networking, etc.), and software; particularly scheduling systems.
A general-purpose contact detection algorithm for nonlinear structural analysis codes
Energy Technology Data Exchange (ETDEWEB)
Heinstein, M.W.; Attaway, S.W.; Swegle, J.W.; Mello, F.J.
1993-05-01
A new contact detection algorithm has been developed to address difficulties associated with the numerical simulation of contact in nonlinear finite element structural analysis codes. Problems including accurate and efficient detection of contact for self-contacting surfaces, tearing and eroding surfaces, and multi-body impact are addressed. The proposed algorithm is portable between dynamic and quasi-static codes and can efficiently model contact between a variety of finite element types including shells, bricks, beams and particles. The algorithm is composed of (1) a location strategy that uses a global search to decide which slave nodes are in proximity to a master surface and (2) an accurate detailed contact check that uses the projected motions of both master surface and slave node. In this report, currently used contact detection algorithms and their associated difficulties are discussed. Then the proposed algorithm and how it addresses these problems is described. Finally, the capability of the new algorithm is illustrated with several example problems.
General Purpose Convolution Algorithm in S4 Classes by Means of FFT
Directory of Open Access Journals (Sweden)
Peter Ruckdeschel
2014-08-01
By means of object orientation this default algorithm is overloaded by more specific algorithms where possible, in particular where explicit convolution formulae are available. Our focus is on R package distr which implements this approach, overloading operator + for convolution; based on this convolution, we define a whole arithmetics of mathematical operations acting on distribution objects, comprising operators +, -, *, /, and ^.
General-purpose molecular dynamics simulations on GPU-based clusters
Trott, Christian R.; Winterfeld, Lars; Crozier, Paul S.
2010-01-01
We present a GPU implementation of LAMMPS, a widely-used parallel molecular dynamics (MD) software package, and show 5x to 13x single node speedups versus the CPU-only version of LAMMPS. This new CUDA package for LAMMPS also enables multi-GPU simulation on hybrid heterogeneous clusters, using MPI for inter-node communication, CUDA kernels on the GPU for all methods working with particle data, and standard LAMMPS C++ code for CPU execution. Cell and neighbor list approaches are compared for be...
Conde Mui\\~no, Patricia; The ATLAS collaboration
2016-01-01
General purpose Graphics Processor Units (GPGPU) are being evaluated for possible future inclusion in an upgraded ATLAS High Level Trigger farm. We have developed a demonstrator including GPGPU implementations of Inner Detector and Muon tracking and Calorimeter clustering within the ATLAS software framework. ATLAS is a general purpose particle physics experiment located on the LHC collider at CERN. The ATLAS Trigger system consists of two levels, with level 1 implemented in hardware and the High Level Trigger implemented in software running on a farm of commodity CPU. The High Level Trigger reduces the trigger rate from the 100 kHz level 1 acceptance rate to 1 kHz for recording, requiring an average per-event processing time of ~250 ms for this task. The selection in the high level trigger is based on reconstructing tracks in the Inner Detector and Muon Spectrometer and clusters of energy deposited in the Calorimeter. Performing this reconstruction within the available farm resources presents a significant ...
Non-convex polygons clustering algorithm
Directory of Open Access Journals (Sweden)
Kruglikov Alexey
2016-01-01
Full Text Available A clustering algorithm is proposed, to be used as a preliminary step in motion planning. It is tightly coupled to the applied problem statement, i.e. uses parameters meaningful only with respect to it. Use of geometrical properties for polygons clustering allows for a better calculation time as opposed to general-purpose algorithms. A special form of map optimized for quick motion planning is constructed as a result.
Partitional clustering algorithms
2015-01-01
This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...
Parallel Wolff Cluster Algorithms
Bae, S.; Ko, S. H.; Coddington, P. D.
The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.
Directory of Open Access Journals (Sweden)
Juan Manuel Cueva Lovelle
2008-12-01
Full Text Available MDE paradigm promises to release developers from writing code. The basis of this paradigm consists in working at such a level of abstraction that will make it easyer for analysts to detail the project to be undertaken. Using the model described by analysts, software tools will do the rest of the task, generating software that will comply with customer's defined requirements. The purpose of this study is to compare general purpose tools available right now that enable to put in practice the principles of this paradigm and aimed at generating a wide variety of applications composed by interactive multimedia and artificial intelligence components.
Recovery Rate of Clustering Algorithms
Li, Fajie; Klette, Reinhard; Wada, T; Huang, F; Lin, S
2009-01-01
This article provides a simple and general way for defining the recovery rate of clustering algorithms using a given family of old clusters for evaluating the performance of the algorithm when calculating a family of new clusters. Under the assumption of dealing with simulated data (i.e., known old
General purpose operator interface
Energy Technology Data Exchange (ETDEWEB)
Bennion, S. I.
1979-07-01
The Hanford Engineering Development Laboratory in Richland, Washington is developing a general-purpose operator interface for controlling set-point driven processes. The interface concept is being developed around graphics display devices with touch-sensitive screens for direct interaction with the displays. Additional devices such as trackballs and keyboards are incorporated for the operator's convenience, but are not necessary for operation. The hardware and software are modular; only those capabilities needed for a particular application need to be used. The software is written in FORTRAN IV with minimal use of operating system calls to increase portability. Several ASCII files generated by the user define displays and correlate the display variables with the process parameters. It is also necessary for the user to build an interface routine which translates the internal graphics commands into device-specific commands. The interface is suited for both continuous flow processes and unit operations. An especially useful feature for controlling unit operations is the ability to generate and execute complex command sequences from ASCII files. This feature relieves operators of many repetitive tasks. 2 figures.
Data clustering algorithms and applications
Aggarwal, Charu C
2013-01-01
Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea
Kernel Generalized Noise Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
WU Xiao-hong; ZHOU Jian-jiang
2007-01-01
To deal with the nonlinear separable problem, the generalized noise clustering (GNC) algorithm is extended to a kernel generalized noise clustering (KGNC) model. Different from the fuzzy c-means (FCM) model and the GNC model which are based on Euclidean distance, the presented model is based on kernel-induced distance by using kernel method. By kernel method the input data are nonlinearly and implicitly mapped into a high-dimensional feature space, where the nonlinear pattern appears linear and the GNC algorithm is performed. It is unnecessary to calculate in high-dimensional feature space because the kernel function can do itjust in input space. The effectiveness of the proposed algorithm is verified by experiments on three data sets. It is concluded that the KGNC algorithm has better clustering accuracy than FCM and GNC in clustering data sets containing noisy data.
Cluster Synchronization Algorithms
Xia, Weiguo; Cao, Ming
2010-01-01
This paper presents two approaches to achieving cluster synchronization in dynamical multi-agent systems. In contrast to the widely studied synchronization behavior, where all the coupled agents converge to the same value asymptotically, in the cluster synchronization problem studied in this paper,
Intuitionistic fuzzy hierarchical clustering algorithms
Institute of Scientific and Technical Information of China (English)
Xu Zeshui
2009-01-01
Intuitionistic fuzzy set (IFS) is a set of 2-tuple arguments, each of which is characterized by a mem-bership degree and a nonmembership degree. The generalized form of IFS is interval-valued intuitionistic fuzzy set (IVIFS), whose components are intervals rather than exact numbers. IFSs and IVIFSs have been found to be very useful to describe vagueness and uncertainty. However, it seems that little attention has been focused on the clus-tering analysis of IFSs and IVIFSs. An intuitionistic fuzzy hierarchical algorithm is introduced for clustering IFSs, which is based on the traditional hierarchical clustering procedure, the intuitionistic fuzzy aggregation operator, and the basic distance measures between IFSs: the Hamming distance, normalized Hamming, weighted Hamming, the Euclidean distance, the normalized Euclidean distance, and the weighted Euclidean distance. Subsequently, the algorithm is extended for clustering IVIFSs. Finally the algorithm and its extended form are applied to the classifications of building materials and enterprises respectively.
Extended Fuzzy Clustering Algorithms
U. Kaymak (Uzay); M. Setnes
2000-01-01
textabstractFuzzy clustering is a widely applied method for obtaining fuzzy models from data. It has been applied successfully in various fields including finance and marketing. Despite the successful applications, there are a number of issues that must be dealt with in practical applications of fuz
General Purpose (office) Network reorganisation
IT Department
2016-01-01
On Saturday 27 August, the IT Department’s Communication Systems group will perform a major reorganisation of CERN’s General Purpose Network. This reorganisation will cause network interruptions on Saturday 27 August (and possibly Sunday 28 August) and will be followed by a change to the IP addresses of connected systems that will come into effect on Monday 3 October. For further details and information about the actions you may need to take, please see: https://information-technology.web.cern.ch/news/general-purpose-office-network-reorganisation.
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Determination of atomic cluster structure with cluster fusion algorithm
DEFF Research Database (Denmark)
Obolensky, Oleg I.; Solov'yov, Ilia; Solov'yov, Andrey V.
2005-01-01
We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters.......We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters....
Particle identification using clustering algorithms
Wirth, R; Löher, B; Savran, D; Silva, J; Pol, H Álvarez; Gil, D Cortina; Pietras, B; Bloch, T; Kröll, T; Nácher, E; Perea, Á; Tengblad, O; Bendel, M; Dierigl, M; Gernhäuser, R; Bleis, T Le; Winkel, M
2013-01-01
A method that uses fuzzy clustering algorithms to achieve particle identification based on pulse shape analysis is presented. The fuzzy c-means clustering algorithm is used to compute mean (principal) pulse shapes induced by different particle species in an automatic and unsupervised fashion from a mixed set of data. A discrimination amplitude is proposed using these principal pulse shapes to identify the originating particle species of a detector pulse. Since this method does not make any assumptions about the specific features of the pulse shapes, it is very generic and suitable for multiple types of detectors. The method is applied to discriminate between photon- and proton-induced signals in CsI(Tl) scintillator detectors and the results are compared to the well-known integration method.
An Improved Weighted Clustering Algorithm in MANET
Institute of Scientific and Technical Information of China (English)
WANG Jin; XU Li; ZHENG Bao-yu
2004-01-01
The original clustering algorithms in Mobile Ad hoc Network (MANET) are firstly analyzed in this paper.Based on which, an Improved Weighted Clustering Algorithm (IWCA) is proposed. Then, the principle and steps of our algorithm are explained in detail, and a comparison is made between the original algorithms and our improved method in the aspects of average cluster number, topology stability, clusterhead load balance and network lifetime. The experimental results show that our improved algorithm has the best performance on average.
Kernel method-based fuzzy clustering algorithm
Institute of Scientific and Technical Information of China (English)
Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping
2005-01-01
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
A new cluster algorithm for graphs
Dongen, S. van
1998-01-01
A new cluster algorithm for graphs called the emph{Markov Cluster algorithm ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a canonical way with
General purpose steam table library :
Energy Technology Data Exchange (ETDEWEB)
Carpenter, John H.; Belcourt, Kenneth Noel; Nourgaliev, Robert
2013-08-01
Completion of the CASL L3 milestone THM.CFD.P7.04 provides a general purpose tabular interpolation library for material properties to support, in particular, standardized models for steam properties. The software consists of three parts, implementations of analytic steam models, a code to generate tables from those models, and an interpolation package to interface the tables to CFD codes such as Hydra-TH. Verification of the standard model is maintained through the entire train of routines. The performance of interpolation package exceeds that of freely available analytic implementation of the steam properties by over an order of magnitude.
Frequent Pattern Mining Algorithms for Data Clustering
DEFF Research Database (Denmark)
Zimek, Arthur; Assent, Ira; Vreeken, Jilles
2014-01-01
that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...
Introduction to Cluster Monte Carlo Algorithms
Luijten, E.
This chapter provides an introduction to cluster Monte Carlo algorithms for classical statistical-mechanical systems. A brief review of the conventional Metropolis algorithm is given, followed by a detailed discussion of the lattice cluster algorithm developed by Swendsen and Wang and the single-cluster variant introduced by Wolff. For continuum systems, the geometric cluster algorithm of Dress and Krauth is described. It is shown how their geometric approach can be generalized to incorporate particle interactions beyond hardcore repulsions, thus forging a connection between the lattice and continuum approaches. Several illustrative examples are discussed.
A Novel Research on Rough Clustering Algorithm
Directory of Open Access Journals (Sweden)
Tao Qu
2014-01-01
Full Text Available The aim of this study is focusing the issue of traditional clustering algorithm subjects to data space distribution influence, a novel clustering algortihm combined with rough set theory is employed to the normal clustering. The proposed rough clustering algorithm takes the condition attributes and decision attributes displayed in the information table as the consistency principle, meanwhile it takes the data supercubic and information entropy to realize data attribute shortcutting and discretizing. Based on above discussion, by applying assemble feature vector addition principle computiation only one scanning information table can realize clustering for the data subject. Experiments reveal that the proposed algorithm is efficient and feasible.
Hesitant fuzzy agglomerative hierarchical clustering algorithms
Zhang, Xiaolu; Xu, Zeshui
2015-02-01
Recently, hesitant fuzzy sets (HFSs) have been studied by many researchers as a powerful tool to describe and deal with uncertain data, but relatively, very few studies focus on the clustering analysis of HFSs. In this paper, we propose a novel hesitant fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique cluster in the first stage, and then compares each pair of the HFSs by utilising the weighted Hamming distance or the weighted Euclidean distance. The two clusters with smaller distance are jointed. The procedure is then repeated time and again until the desirable number of clusters is achieved. Moreover, we extend the algorithm to cluster the interval-valued hesitant fuzzy sets, and finally illustrate the effectiveness of our clustering algorithms by experimental results.
Intuitionistic Fuzzy Possibilistic C Means Clustering Algorithms
Directory of Open Access Journals (Sweden)
Arindam Chaudhuri
2015-01-01
Full Text Available Intuitionistic fuzzy sets (IFSs provide mathematical framework based on fuzzy sets to describe vagueness in data. It finds interesting and promising applications in different domains. Here, we develop an intuitionistic fuzzy possibilistic C means (IFPCM algorithm to cluster IFSs by hybridizing concepts of FPCM, IFSs, and distance measures. IFPCM resolves inherent problems encountered with information regarding membership values of objects to each cluster by generalizing membership and nonmembership with hesitancy degree. The algorithm is extended for clustering interval valued intuitionistic fuzzy sets (IVIFSs leading to interval valued intuitionistic fuzzy possibilistic C means (IVIFPCM. The clustering algorithm has membership and nonmembership degrees as intervals. Information regarding membership and typicality degrees of samples to all clusters is given by algorithm. The experiments are performed on both real and simulated datasets. It generates valuable information and produces overlapped clusters with different membership degrees. It takes into account inherent uncertainty in information captured by IFSs. Some advantages of algorithms are simplicity, flexibility, and low computational complexity. The algorithm is evaluated through cluster validity measures. The clustering accuracy of algorithm is investigated by classification datasets with labeled patterns. The algorithm maintains appreciable performance compared to other methods in terms of pureness ratio.
Algorithm for Spatial Clustering with Obstacles
El-Sharkawi, Mohamed E
2009-01-01
In this paper, we propose an efficient clustering technique to solve the problem of clustering in the presence of obstacles. The proposed algorithm divides the spatial area into rectangular cells. Each cell is associated with statistical information that enables us to label the cell as dense or non-dense. We also label each cell as obstructed (i.e. intersects any obstacle) or non-obstructed. Then the algorithm finds the regions (clusters) of connected, dense, non-obstructed cells. Finally, the algorithm finds a center for each such region and returns those centers as centers of the relatively dense regions (clusters) in the spatial area.
A new fusion algorithm for fuzzy clustering
Directory of Open Access Journals (Sweden)
Ivan Vidović
2014-12-01
Full Text Available In this paper, we have considered the merging problem of two ellipsoidal clusters in order to construct a new fusion algorithm for fuzzy clustering. We have proposed a criterion for merging two ellipsoidal clusters ∏1, ∏2 with associated main Mahalanobis circles Ej(cj,σj, where cj is the centroid and σ^2j is the Mahalanobis variance of cluster ∏j . Based on the well-known Davies-Bouldin index, we have constructed a new fusion algorithm. The criterion has been tested on several data sets, and the performance of the fusion algorithm has been demonstrated on an illustrative example.
Novel Cluster Validity Index for FCM Algorithm
Institute of Scientific and Technical Information of China (English)
Jian Yu; Cui-Xia Li
2006-01-01
How to determine an appropriate number of clusters is very important when implementing a specific clustering algorithm, like c-means, fuzzy c-means (FCM). In the literature, most cluster validity indices are originated from partition or geometrical property of the data set. In this paper, the authors developed a novel cluster validity index for FCM, based on the optimality test of FCM. Unlike the previous cluster validity indices, this novel cluster validity index is inherent in FCM itself. Comparison experiments show that the stability index can be used as cluster validity index for the fuzzy c-means.
An object-oriented cluster search algorithm
Energy Technology Data Exchange (ETDEWEB)
Silin, Dmitry; Patzek, Tad
2003-01-24
In this work we describe two object-oriented cluster search algorithms, which can be applied to a network of an arbitrary structure. First algorithm calculates all connected clusters, whereas the second one finds a path with the minimal number of connections. We estimate the complexity of the algorithm and infer that the number of operations has linear growth with respect to the size of the network.
An extended EM algorithm for subspace clustering
Institute of Scientific and Technical Information of China (English)
Lifei CHEN; Qingshan JIANG
2008-01-01
Clustering high dimensional data has become a challenge in data mining due to the curse of dimension-ality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calcu-lates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weight-ing method, as a means to minimize a proposed cluster-ing criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the pos-sible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.
Data clustering theory, algorithms, and applications
Gan, Guojun; Wu, Jianhong
2007-01-01
Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.
Load Balancing Algorithm for Cache Cluster
Institute of Scientific and Technical Information of China (English)
刘美华; 古志民; 曹元大
2003-01-01
By the load definition of cluster, the request is regarded as granularity to compute load and implement the load balancing in cache cluster. First, the processing power of cache-node is studied from four aspects: network bandwidth, memory capacity, disk access rate and CPU usage. Then, the weighted load of cache-node is customized. Based on this, a load-balancing algorithm that can be applied to the cache cluster is proposed. Finally, Polygraph is used as a benchmarking tool to test the cache cluster possessing the load-balancing algorithm and the cache cluster with cache array routing protocol respectively. The results show the load-balancing algorithm can improve the performance of the cache cluster.
Semantic Based Cluster Content Discovery in Description First Clustering Algorithm
Directory of Open Access Journals (Sweden)
MUHAMMAD WASEEM KHAN
2017-01-01
Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.
Self-organization and clustering algorithms
Bezdek, James C.
1991-01-01
Kohonen's feature maps approach to clustering is often likened to the k or c-means clustering algorithms. Here, the author identifies some similarities and differences between the hard and fuzzy c-Means (HCM/FCM) or ISODATA algorithms and Kohonen's self-organizing approach. The author concludes that some differences are significant, but at the same time there may be some important unknown relationships between the two methodologies. Several avenues of research are proposed.
The Georgi Algorithms of Jet Clustering
Ge, Shao-Feng
2014-01-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to $J^{(n)}_\\beta$ with a jet function index $n$ in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is p...
Institute of Scientific and Technical Information of China (English)
朱宇兰
2016-01-01
GPU通用计算是近几年来迅速发展的一个计算领域，以其强大的并行处理能力为密集数据单指令型计算提供了一个绝佳的解决方案，但受限制于芯片的制造工艺，其运算能力遭遇瓶颈。本文从GPU通用计算的基础——图形API开始，分析GPU并行算法特征、运算的过程及特点，并抽象出了一套并行计算框架。通过计算密集行案例，演示了框架的使用方法，并与传统GPU通用计算的实现方法比较，证明了本框架具有代码精简、与图形学无关的特点。%GPGPU(General Purpose Computing on Graphics Processing Unit) is a calculation mothed that develops quiet fast in recent years, it provide an optimal solution for the intensive data calculation of a single instruction with a powerful treatment, however it is restricted in CPU making process to lead to entounter the bottleneck of hardware manufacture. This paper started from GPGPU by Graphics API to analyze the featuers, progress and characteristics of GPU parallel algorithm and obtained a set of computing framework to demonstrate it by an intensive line calculation and compared between the traditional GPU and the parallel computing framework to turn out to show that there was a simplified code and had nothing to do with graphics.
Optimal Hops-Based Adaptive Clustering Algorithm
Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong
This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.
Issues Challenges and Tools of Clustering Algorithms
Directory of Open Access Journals (Sweden)
Parul Agarwal
2011-05-01
Full Text Available Clustering is an unsupervised technique of Data Mining. It means grouping similar objects together and separating the dissimilar ones. Each object in the data set is assigned a class label in the clustering process using a distance measure. This paper has captured the problems that are faced in real when clustering algorithms are implemented .It also considers the most extensively used tools which are readily available and support functions which ease the programming. Once algorithms have been implemented, they also need to be tested for its validity. There exist several validation indexes for testing the performance and accuracy which have also been discussed here.
Blockspin Cluster Algorithms for Quantum Spin Systems
Wiese, U J
1992-01-01
Cluster algorithms are developed for simulating quantum spin systems like the one- and two-dimensional Heisenberg ferro- and anti-ferromagnets. The corresponding two- and three-dimensional classical spin models with four-spin couplings are maped to blockspin models with two-blockspin interactions. Clusters of blockspins are updated collectively. The efficiency of the method is investigated in detail for one-dimensional spin chains. Then in most cases the new algorithms solve the problems of slowing down from which standard algorithms are suffering.
A New Clustering Algorithm for Face Classification
Directory of Open Access Journals (Sweden)
Shaker K. Ali
2016-06-01
Full Text Available In This paper, we proposed new clustering algorithm depend on other clustering algorithm ideas. The proposed algorithm idea is based on getting distance matrix, then the exclusion of the matrix points which will be clustered by saving the location (row, column of these points and determine the minimum distance of these points which will be belongs the group (class and keep the other points which are not clustering yet. The propose algorithm is applied to image data base of the human face with different environment (direction, angles... etc.. These data are collected from different resource (ORL site and real images collected from random sample of Thi_Qar city population in lraq. Our algorithm has been implemented on three types of distance to calculate the minimum distance between points (Euclidean, Correlation and Minkowski distance .The efficiency ratio of proposed algorithm has varied according to the data base and threshold, the efficiency of our algorithm is exceeded (96%. Matlab (2014 has been used in this work.
A Survey of Grid Based Clustering Algorithms
Directory of Open Access Journals (Sweden)
MR ILANGO
2010-08-01
Full Text Available Cluster Analysis, an automatic process to find similar objects from a database, is a fundamental operation in data mining. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering techniques have been discussed extensively in SimilaritySearch, Segmentation, Statistics, Machine Learning, Trend Analysis, Pattern Recognition and Classification [1]. Clustering methods can be classified into i Partitioning methods ii Hierarchical methods iii Density-based methods iv Grid-based methods v Model-based methods. Grid based methods quantize the object space into a finite number of cells (hyper-rectangles and then perform the required operations on the quantized space. The main advantage of Grid based method is its fast processing time which depends on number of cells in each dimension in quantized space. In this research paper, we present some of the grid based methods such as CLIQUE (CLustering In QUEst [2], STING (STatistical INformation Grid [3], MAFIA (Merging of Adaptive Intervals Approach to Spatial Data Mining [4], Wave Cluster [5]and O-CLUSTER (Orthogonal partitioning CLUSTERing [6], as a survey andalso compare their effectiveness in clustering data objects. We also present some of the latest developments in Grid Based methods such as Axis Shifted Grid Clustering Algorithm [7] and Adaptive Mesh Refinement [Wei-Keng Liao etc] [8] to improve the processing time of objects.
Cluster hybrid Monte Carlo simulation algorithms
Plascak, J. A.; Ferrenberg, Alan M.; Landau, D. P.
2002-06-01
We show that addition of Metropolis single spin flips to the Wolff cluster-flipping Monte Carlo procedure leads to a dramatic increase in performance for the spin-1/2 Ising model. We also show that adding Wolff cluster flipping to the Metropolis or heat bath algorithms in systems where just cluster flipping is not immediately obvious (such as the spin-3/2 Ising model) can substantially reduce the statistical errors of the simulations. A further advantage of these methods is that systematic errors introduced by the use of imperfect random-number generation may be largely healed by hybridizing single spin flips with cluster flipping.
Polyclonal clustering algorithm and its convergence
Institute of Scientific and Technical Information of China (English)
MA Li; JIAO Li-cheng; BAI Lin; CHEN Chang-guo
2008-01-01
Being characteristic of non-teacher learning, self-organization, memory, and noise resistance, the artificial immune system is a research focus in the field of intelligent information processing. Based on the basic principles of organism immune and clonal selection, this article presents a polyclonal clustering algorithm characteristic of self-adaptation. According to the core idea of the algorithm, various immune operators in the artificial immune system are employed in the clustering process; moreover, clustering numbers are adjusted in accordance with the affinity function. Introduction of the recombination operator can effectively enhance the diversity of the individual antibody in a generation population, so that the searching scope for solutions is enlarged and the premature phenomenon of the algorithm is avoided. Besides, introduction of the inconsistent mutation operator enhances the adaptability and optimizes the performance of local solution seeking. Meanwhile, the convergence of the algorithm is accelerated. In addition, the article also proves the convergence of the algorithm by employing the Markov chain. Results of the data simulation experiment show that the algorithm is capable of obtaining reasonable and effective cluster.
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.
An Adaptive Clustering Algorithm for Intrusion Detection
Institute of Scientific and Technical Information of China (English)
QIU Juli
2007-01-01
In this paper,we introduce an adaptive clustering algorithm for intrusion detection based on wavecluster which was introduced by Gholamhosein in 1999 and used with success in image processing.Because of the non-stationary characteristic of network traffic,we extend and develop an adaptive wavecluster algorithm for intrusion detection.Using the multiresolution property of wavelet transforms,we can effectively identify arbitrarily shaped clusters at different scales and degrees of detail,moreover,applying wavelet transform removes the noise from the original feature space and make more accurate cluster found.Experimental results on KDD-99 intrusion detection dataset show the efficiency and accuracy of this algorithm.A detection rate above 96% and a false alarm rate below 3% are achieved.
Efficient Cluster Head Selection Algorithm for MANET
Directory of Open Access Journals (Sweden)
Khalid Hussain
2013-01-01
Full Text Available In mobile ad hoc network (MANET cluster head selection is considered a gigantic challenge. In wireless sensor network LEACH protocol can be used to select cluster head on the bases of energy, but it is still a dispute in mobil ad hoc networks and especially when nodes are itinerant. In this paper we proposed an efficient cluster head selection algorithm (ECHSA, for selection of the cluster head efficiently in Mobile ad hoc networks. We evaluate our proposed algorithm through simulation in OMNet++ as well as on test bed; we experience the result according to our assumption. For further evaluation we also compare our proposed protocol with several other protocols like LEACH-C and consequences show perfection.
Performance Analysis of Hierarchical Clustering Algorithm
Directory of Open Access Journals (Sweden)
K.Ranjini
2011-07-01
Full Text Available Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters, so that the data in each subset (ideally share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. This paper explains the implementation of agglomerative and divisive clustering algorithms applied on various types of data. The details of the victims of Tsunami in Thailand during the year 2004, was taken as the test data. Visual programming is used for implementation and running time of the algorithms using different linkages (agglomerative to different types of data are taken for analysis.
Parallel Clustering Algorithms for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B T; Wissink, A M; Hysom, D A
2005-10-26
We compare several different parallel implementation approaches for the clustering operations performed during adaptive gridding operations in patch-based structured adaptive mesh refinement (SAMR) applications. Specifically, we target the clustering algorithm of Berger and Rigoutsos (BR91), which is commonly used in many SAMR applications. The baseline for comparison is a simplistic parallel extension of the original algorithm that works well for up to O(10{sup 2}) processors. Our goal is a clustering algorithm for machines of up to O(10{sup 5}) processors, such as the 64K-processor IBM BlueGene/Light system. We first present an algorithm that avoids the unneeded communications of the simplistic approach to improve the clustering speed by up to an order of magnitude. We then present a new task-parallel implementation to further reduce communication wait time, adding another order of magnitude of improvement. The new algorithms also exhibit more favorable scaling behavior for our test problems. Performance is evaluated on a number of large scale parallel computer systems, including a 16K-processor BlueGene/Light system.
Analysis of Stemming Algorithm for Text Clustering
Directory of Open Access Journals (Sweden)
N.Sandhya
2011-09-01
Full Text Available Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters. In Bag of words representation of documents the words that appear in documents often have many morphological variants and in most cases, morphological variants of words have similar semantic interpretations and can be considered as equivalent for the purpose of clustering applications. For this reason, a number of stemming Algorithms, or stemmers, have been developed, which attempt to reduce a word to its stem or root form. Thus, the key terms of a document are represented by stems rather than by the original words. In this work we have studied the impact of stemming algorithm along with four popular similarity measures (Euclidean, cosine, Pearson correlation and extended Jaccard in conjunction with different types of vector representation (boolean, term frequency and term frequency and inverse document frequency on cluster quality. For Clustering documents we have used partitional based clustering technique K Means. Performance is measured against a human-imposed classification of Classic data set. We conducted a number of experiments and used entropy measure to assure statistical significance of results. Cosine, Pearson correlation and extended Jaccard similarities emerge as the best measures to capture human categorization behavior, while Euclidean measures perform poor. After applying the Stemming algorithm Euclidean measure shows little improvement.
High-Performance Broadcasting Algorithms on Cluster
Institute of Scientific and Technical Information of China (English)
舒继武; 魏英霞; 王鼎兴
2004-01-01
In many clusters connected by high-speed communication networks, the exact structure of the underlying communication network and the latency difference between different sending and receiving pairs may be ignored when they broadcast, such as in the approach adopted by the broadcasting method in MPICH,a widely used MPI implementation. However, the underlying network cluster topologies are becoming more and more complicated and the performance of traditional broadcasting algorithms, such as MPICH's MPI_Bcast, is far from good. This paper analyzed the impact of communication latencies and the underlying topologies on the performance of broadcasting algorithms for multilevel clusters. A multilevel model was developed for broadcasting in clusters with complicated topologies, which divides the cluster topology into many levels based on the underlying topology. The multilevel model was used to develop a new broadcast algorithm,MLM broadcast-2 (MLMB-2), that adapts to a wide range of clusters. Comparison of the performance of the counterpart MPI operation MPI_Bcast and MLMB-2 shows that MLMB-2 outperforms MPl_Bcast by decreasing the broadcast running time by 60%-90%.
Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Shchur, L. N.; Andreichenko, V. B.; Dotsenko, Vl. S.
We describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
Cluster algorithm special purpose processor
Energy Technology Data Exchange (ETDEWEB)
Talapov, A.L.; Shchur, L.N.; Andreichenko, V.B.; Dotsenko, V.S. (Landau Inst. for Theoretical Physics, GSP-1 117940 Moscow V-334 (USSR))
1992-08-10
In this paper, the authors describe a Special Purpose Processor, realizing the Wolff algorithm in hardware, which is fast enough to study the critical behaviour of 2D Ising-like systems containing more than one million spins. The processor has been checked to produce correct results for a pure Ising model and for Ising model with random bonds. Its data also agree with the Nishimori exact results for spin glass. Only minor changes of the SPP design are necessary to increase the dimensionality and to take into account more complex systems such as Potts models.
An Improved Heuristic Ant-Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
Yunfei Chen; Yushu Liu; Jihai Zhao
2004-01-01
An improved heuristic ant-clustering algorithm(HAC)is presented in this paper. A device of ＇memory bank＇ is proposed,which can bring forth heuristic knowledge guiding ant to move in the bi-dimension grid space.The device experiments on real data sets and synthetic data sets.The results demonstrate that HAC has superiority in misclassification error rate and runtime over the classical algorithm.
High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm
Directory of Open Access Journals (Sweden)
Dieter Hendricks
2016-02-01
Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.
A Fast Algorithm for Support Vector Clustering
Institute of Scientific and Technical Information of China (English)
吕常魁; 姜澄宇; 王宁生
2004-01-01
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model[3] , the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed.The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
Fuzzy Rules for Ant Based Clustering Algorithm
Directory of Open Access Journals (Sweden)
Amira Hamdi
2016-01-01
Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.
Application of a New Fuzzy Clustering Algorithm in Intrusion Detection
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
This paper presents a new Section Set Adaptive FCM algorithm. The algorithm solved the shortcomings of localoptimality, unsure classification and clustering numbers ascertained previously. And it improved on the architecture of FCM al-gorithm, enhanced the analysis for effective clustering. During the clustering processing, it may adjust clustering numbers dy-namically. Finally, it used the method of section set decreasing the time of classification. By experiments, the algorithm can im-prove dependability of clustering and correctness of classification.
Parallel FFT Algorithm on Computer Clusters
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
DFT is widely applied in the field of signal process and others. Most present rapid ways of calculation are either based on paralleled computers connected by such particular systems like butterfly network, hypercube etc;or based on the assumption of instant transportation, non-conflict communication, complete connection of paralleled processors and unlimited usable processors. However, the delay of communication in the system of information transmission cannot be ignored. This paper works on the following aspects: instant transmission, dispatching missions, and the path of information through the communication link in the computer cluster systems;layout of the dynamic FFT algorithm under the different structures of computer clusters.
12 CFR 1703.31 - General purposes.
2010-01-01
... 12 Banks and Banking 7 2010-01-01 2010-01-01 false General purposes. 1703.31 Section 1703.31 Banks... DEVELOPMENT OFHEO ORGANIZATION AND FUNCTIONS RELEASE OF INFORMATION Testimony and Production of Documents in... time of employees for their official duties, maintain the impartial position of OFHEO in litigation...
Comparative study of several Clustering Algorithms
Directory of Open Access Journals (Sweden)
Prof. Neha Soni, Dr. Amit Ganatra
2012-12-01
Full Text Available Cluster Analysis is a process of grouping theobjects, where objects can be physical like a studentor can be an abstract such as behaviour of acustomer or handwriting of a person. The clusteranalysis is as old as a human life and has its rootsin many fields such as statistics, machine learning,biology, artificial intelligence. It is an unsupervisedlearning and faces many challenges such as a highdimension of the dataset, arbitrary shapes ofclusters, scalability, input parameter, domainknowledge and noisy data. Large number ofclustering algorithms had been proposed till date toaddress these challenges. There do not exist a singlealgorithm which can adequately handle all sorts ofrequirement. This makes a great challenge for theuser to do selection among the available algorithmfor the specific task. The purpose of this paper is toprovide a detailed analytical comparison of some ofthe very well known clustering algorithms, whichprovides guidance for the selection of clusteringalgorithm for a specific application.
An incremental clustering algorithm based on Mahalanobis distance
Aik, Lim Eng; Choon, Tan Wee
2014-12-01
Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.
CABOSFV algorithm for high dimensional sparse data clustering
Institute of Scientific and Technical Information of China (English)
Sen Wu; Xuedong Gao
2004-01-01
An algorithm, Clustering Algorithm Based On Sparse Feature Vector (CABOSFV), was proposed for the high dimensional clustering of binary sparse data. This algorithm compresses the data effectively by using a tool 'Sparse Feature Vector', thus reduces the data scale enormously, and can get the clustering result with only one data scan. Both theoretical analysis and empirical tests showed that CABOSFV is of low computational complexity. The algorithm finds clusters in high dimensional large datasets efficiently and handles noise effectively.
First Cluster Algorithm Special Purpose Processor
Talapov, A. L.; Andreichenko, V. B.; Dotsenko S., Vi.; Shchur, L. N.
We describe the architecture of the special purpose processor built to realize in hardware cluster Wolff algorithm, which is not hampered by a critical slowing down. The processor simulates two-dimensional Ising-like spin systems. With minor changes the same very effective architecture, which can be defined as a Memory Machine, can be used to study phase transitions in a wide range of models in two or three dimensions.
Dynamic exponents for potts model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
We have studied the Swendsen-Wang and Wolff cluster update algorithms for the Ising model in 2, 3 and 4 dimensions. The data indicate simple relations between the specific heat and the Wolff autocorrelations, and between the magnetization and the Swendsen-Wang autocorrelations. This implies that the dynamic critical exponents are related to the static exponents of the Ising model. We also investigate the possibility of similar relationships for the Q-state Potts model.
Enhanced Unequal Clustering Algorithm for Wireless Sensor Networks
Talbi, Said; Zaouche, Lotfi
2015-01-01
International audience; Clustering is considered as solution for more energy conservation during communications in wireless sensor networks. Recently, a new clustering algorithm named Unequal Clustering Algorithm (UCA) is proposed to avoid the burdened cluster-heads located around the sink due to the traffic coming from others which are far to the base station. This paper presents an Enhanced Unequal Clustering Algorithm called EUCA. This solution reduces the control traffic during a clusteri...
ITS Cluster Finding Algorithm on GPU
Changaival, Boonyarit
2014-01-01
ITS cluster finding algorithm is one of the data reduction algorithms at ALICE. It needs to be processed fast due to a high amount of data readout from the detector. A variety of platforms were studied for the system design. My work is to design, implement and benchmark this algorithm on a GPU platform. GPU is one of many platform that promote parallel computing. A high-end GPU can contain over 2000 processing cores comparing to the commodity CPUs which have only four cores. The program is written in C and CUDA library. The throughput (Number of events per second) is used as a metric to measure the performance. With the latest implementation, the throughput was increased by a factor of 5.
Hearing the clusters in a graph: A distributed algorithm
Sahai, Tuhin; Banaszuk, Andrzej
2009-01-01
We propose a novel distributed algorithm to decompose graphs or cluster data. The algorithm recovers the solution obtained from spectral clustering without need for expensive eigenvalue/ eigenvector computations. We demonstrate that by solving the wave equation on the graph, every node can assign itself to a cluster by performing a local fast Fourier transform. We prove the equivalence of our algorithm to spectral clustering, derive convergence rates and demonstrate it on examples.
A High-Order CFS Algorithm for Clustering Big Data
Fanyu Bu; Zhikui Chen; Peng Li; Tong Tang; Ying Zhang
2016-01-01
With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS) to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learn...
Improvement and Parallelism of k-Means Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
TIAN Jinlan; ZHU Lin; ZHANG Suqin; LIU Lu
2005-01-01
The k-means clustering algorithm is one of the most commonly used algorithms for clustering analysis. The traditional k-means algorithm is, however, inefficient while working on large numbers of data sets and improving the algorithm efficiency remains a problem. This paper focuses on the efficiency issues of cluster algorithms. A refined initial cluster centers method is designed to reduce the number of iterative procedures in the algorithm. A parallel k-means algorithm is also studied for the problem of the operation limitation of a single processor machine when given huge data sets. The analytical results demonstrate that these improvements can greatly enhance the efficiency of the k-means algorithm, i.e., allow the grouping of a large number of data sets more accurately and more quickly. The analysis has theoretical and practical importance for work on the improvement and parallelism of cluster algorithms.
Parallelization of Edge Detection Algorithm using MPI on Beowulf Cluster
Haron, Nazleeni; Amir, Ruzaini; Aziz, Izzatdin A.; Jung, Low Tan; Shukri, Siti Rohkmah
In this paper, we present the design of parallel Sobel edge detection algorithm using Foster's methodology. The parallel algorithm is implemented using MPI message passing library and master/slave algorithm. Every processor performs the same sequential algorithm but on different part of the image. Experimental results conducted on Beowulf cluster are presented to demonstrate the performance of the parallel algorithm.
A CLUSTERING ALGORITHM FOR MIXED NUMERIC AND CATEGORICAL DATA
Institute of Scientific and Technical Information of China (English)
Ohn Mar San; Van-Nam Huynh; Yoshiteru Nakamori
2003-01-01
Most of the earlier work on clustering mainly focused on numeric data whose inherent geometric properties can be exploited to naturally define distance functions between data points. However, data mining applications frequently involve many datasets that also consists of mixed numeric and categorical attributes. In this paper we present a clustering algorithm which is based on the k-means algorithm. The algorithm clusters objects with numeric and categorical attributes in a way similar to k-means. The object similarity measure is derived from both numeric and categorical attributes. When applied to numeric data, the algorithm is identical to the k-means. The main result of this paper is to provide a method to update the "cluster centers" of clustering objects described by mixed numeric and categorical attributes in the clustering process to minimize the clustering cost function. The clustering performance of the algorithm is demonstrated with the two well known data sets, namely credit approval and abalone databases.
EFFICIENT ALGORITHM FOR MINING FREQUENT ITEMSETS USING CLUSTERING TECHNIQUES
Directory of Open Access Journals (Sweden)
D.Kerana Hanirex
2011-03-01
Full Text Available Now a days, Association rule plays an important role. The purchasing of one product when another product is purchased represents an association rule. The Apriori algorithm is the basic algorithm for mining association rules. This paper presents an efficient Partition Algorithm for Mining Frequent Itemsets(PAFI using clustering. This algorithm finds the frequent itemsets by partitioning the database transactions into clusters. Clusters are formed based on the imilarity measures between the transactions. Then it finds the frequent itemsets with the transactions in the clusters directly using improved Apriori algorithm which further reduces the number of scans in the database and hence improve the efficiency.
A hybrid monkey search algorithm for clustering analysis.
Chen, Xin; Zhou, Yongquan; Luo, Qifang
2014-01-01
Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
A Hybrid Monkey Search Algorithm for Clustering Analysis
Directory of Open Access Journals (Sweden)
Xin Chen
2014-01-01
Full Text Available Clustering is a popular data analysis and data mining technique. The k-means clustering algorithm is one of the most commonly used methods. However, it highly depends on the initial solution and is easy to fall into local optimum solution. In view of the disadvantages of the k-means method, this paper proposed a hybrid monkey algorithm based on search operator of artificial bee colony algorithm for clustering analysis and experiment on synthetic and real life datasets to show that the algorithm has a good performance than that of the basic monkey algorithm for clustering analysis.
The Georgi algorithms of jet clustering
Ge, Shao-Feng
2015-05-01
We reveal the direct link between the jet clustering algorithms recently proposed by Howard Georgi and parton shower kinematics, providing firm foundation from the theoretical side. The kinematics of this class of elegant algorithms is explored systematically for partons with arbitrary masses and the jet function is generalized to J {/β ( n)} with a jet function index n in order to achieve more degrees of freedom. Based on three basic requirements that, the result of jet clustering is process-independent and hence logically consistent, for softer subjets the inclusion cone is larger to conform with the fact that parton shower tends to emit softer partons at earlier stage with larger opening angle, and that the cone size cannot be too large in order to avoid mixing up neighbor jets, we derive constraints on the jet function parameter β and index n which are closely related to cone size cutoff. Finally, we discuss how jet function values can be made invariant under Lorentz boost.
PROPOSED A HETEROGENEOUS CLUSTERING ALGORITHM TO IMPROVE QOS IN WSN
Directory of Open Access Journals (Sweden)
Mehran Mokhtari
2016-07-01
Full Text Available In this article it has presented leach extended hierarchical 3-level clustered heterogeneous and dynamics algorithm. On suggested protocol (LEH3LA with planning of selected auction cluster head, and alternative cluster head node, problem of delay on processing, processing of selecting members, decrease of expenses, and energy consumption, decrease of sending message, and receiving messages inside the clusters, selecting of cluster heads in large sensor networks were solved. This algorithm uses hierarchical heterogeneous network (3-levels, collective intelligence, and intra-cluster interaction for communications. Also it will solve the problems of sending data in Multi-BS mobile networks, expanding inter-cluster networks, overlap cluster, genesis orphan nodes, boundary change dynamically clusters, using backbone networks, cloud sensor. Using sleep/wake scheduling algorithm or TDMA-schedule alternative cluster head node provides redundancy, and fault tolerance. Local processing in cluster head nodes, and alternative cluster head, intra-cluster and inter-cluster communications such as Multi-HOP cause increase on processing speed, and sending data intra-cluster and inter-cluster. Decrease of overhead network, and increase the load balancing among cluster heads. Using encapsulation of data method, by cluster head nodes, energy consumption decrease during sending data. Also by improving quality of service (QoS in CBRP, LEACH, 802.15.4, decrease of energy consumption in sensors, cluster heads and alternative cluster head nodes, cause increase on lift time of sensor networks
General purpose fast decoupled power flow
Energy Technology Data Exchange (ETDEWEB)
Nanda, J.; Bijwe, P.R.; Henry, J.; Bapi Raju, V. (Indian Inst. of Tech., New Delhi (IN). Dept. of Electrical Engineering)
1992-03-01
A general purpose fast decoupled power flow model (GFDPF) is presented that exhibits more or less best convergence properties for both well-behaved and ill-conditioned systems. In the proposed model, all network shunts such as line charging, external shunts at buses, shunts formed due to {pi} representation of off-nominal in-phase transformers etc. are treated as constant impedance loads. The effect of line resistances is considered while forming the (B') matrix and are ignored in forming the (B'') matrix. This model is tested on several systems for both well-behaved and ill-conditioned situations. A simple, efficient compensation technique is proposed to deal with Q-limit enforcements associated with bus-type switchings at voltage-controlled buses. The results demonstrate that the proposed GFDPF model exhibits more or less stable convergence behaviour for both well-behaved and ill-conditioned situations. (author).
Genetic Algorithms for Auto-Clustering in KDD
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
In solving the clustering problem in the context of knowledge discovery in databases (KDD), the traditional methods, for example, the K-means algorithm and its variants, usually require the users to provide the number of clusters in advance based on the pro-information. Unfortunately, the number of clusters in general is unknown to the users who are usually short of pro-information. Therefore, the clustering calculation becomes a tedious trial-and-error work, and the result is often not global optimal especially when the number of clusters is large. In this paper, a new dynamic clustering method based on genetic algorithms (GA) is proposed and applied for auto-clustering of data entities in large databases. The algorithm can automatically cluster the data according to their similarities and find the exact number of clusters. Experiment results indicate that the method is of global optimization by dynamically clustering logic.
Energy Aware Clustering Algorithms for Wireless Sensor Networks
Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian
2011-09-01
The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.
A Novel Clustering Algorithm Inspired by Membrane Computing
Directory of Open Access Journals (Sweden)
Hong Peng
2015-01-01
Full Text Available P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.
An energy efficient clustering routing algorithm for wireless sensor networks
Institute of Scientific and Technical Information of China (English)
LI Li; DONG Shu-song; WEN Xiang-ming
2006-01-01
This article proposes an energy efficient clustering routing (EECR) algorithm for wireless sensor network. The algorithm can divide a sensor network into a few clusters and select a cluster head base on weight value that leads to more uniform energy dissipation evenly among all sensor nodes.Simulations and results show that the algorithm can save overall energy consumption and extend the lifetime of the wireless sensor network.
Introduction to Clustering Algorithms and Applications
Yang, Sibei; Tao, Liangde; Gong, Bingchen
2014-01-01
Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different fields are actively working on the clustering problem. This paper provides an overview of the different representative clustering methods. In addition, application of clustering in different field is briefly introduced.
Institute of Scientific and Technical Information of China (English)
杨磊; 王玲; 龚学余
2013-01-01
Combined with standard mathematical model for evaluating quality of deploying results, a new high-performance parallel algorithm for source pencils' deployment was obtained by using parallel plant growth simulation algorithm which was completely parallelized with CUDA execute model, and the corresponding code can run on GPU. Based on such work, several instances in various scales were used to test the new version of algorithm. The results show that, based on the advantage of old versions, the performance of new one is improved more than 500 times comparing with the CPU version, and also 30 times with the CPU plus GPU hybrid version. The computation time of new version is less than ten minutes for the irradiator of which the activity is less than 111 PBq. For a single GTX275 GPU, the maximum computing power of new version is no more than 167 PBq as well as the computation time is no more than 25 minutes, and for multiple GPUs, the power can be improved more. Overall, the new version of algorithm running on GPU can satisfy the requirement of source pencils' deployment of any domestic irradiator, and it is of high competitiveness.%本文利用CUDA执行模型实现了植物模拟生长算法的完全并行化,结合标准排源质量评价数学模型,得到了一种高效率的并行排源算法,对应的代码能运行在GPU上.在此基础上,利用若干不同规模的排源算例对新版本算法进行了测试.测试结果表明,在保持已有版本算法优点的基础上,新算法的计算效率相对CPU版本提升了500倍以上,相对CPU+ GPU混合版本,也提升了30倍以上.对111 PBq以下装置,新算法的计算时间小于10 min.利用单GTX275 GPU,新算法的计算性能上限为167 PBq左右,时间不超过25 min,利用多GPU还可提高计算能力.综上所述,基于GPU的新版本算法可满足目前国内任意规模γ辐照装置的高质量排源需要,具有高度的竞争力.
PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
ZHOU HaoFeng(周皓峰); YUAN QingQing(袁晴晴); CHENG ZunPing(程尊平); SHI BaiLe(施伯乐)
2003-01-01
Cluster analysis is a process to classify data in a specified data set. In this field,much attention is paid to high-efficiency clustering algorithms. In this paper, the features in thecurrent partition-based and hierarchy-based algorithms are reviewed, and a new hierarchy-basedalgorithm PHC is proposed by combining advantages of both algorithms, which uses the cohesionand the closeness to amalgamate the clusters. Compared with similar algorithms, the performanceof PHC is improved, and the quality of clustering is guaranteed. And both the features were provedby the theoretic and experimental analyses in the paper.
Counterexamples to convergence theorem of maximum-entropy clustering algorithm
Institute of Scientific and Technical Information of China (English)
于剑; 石洪波; 黄厚宽; 孙喜晨; 程乾生
2003-01-01
In this paper, we surveyed the development of maximum-entropy clustering algorithm, pointed out that the maximum-entropy clustering algorithm is not new in essence, and constructed two examples to show that the iterative sequence given by the maximum-entropy clustering algorithm may not converge to a local minimum of its objective function, but a saddle point. Based on these results, our paper shows that the convergence theorem of maximum-entropy clustering algorithm put forward by Kenneth Rose et al. does not hold in general cases.
An Incremental Algorithm of Text Clustering Based on Semantic Sequences
Institute of Scientific and Technical Information of China (English)
FENG Zhonghui; SHEN Junyi; BAO Junpeng
2006-01-01
This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.
A new efficient Cluster Algorithm for the Ising Model
Nyffeler, M; Wiese, U J; Nyfeler, Matthias; Pepe, Michele; Wiese, Uwe-Jens
2005-01-01
Using D-theory we construct a new efficient cluster algorithm for the Ising model. The construction is very different from the standard Swendsen-Wang algorithm and related to worm algorithms. With the new algorithm we have measured the correlation function with high precision over a surprisingly large number of orders of magnitude.
URL Mining Using Agglomerative Clustering Algorithm
Directory of Open Access Journals (Sweden)
Chinmay R. Deshmukh
2015-02-01
Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.
A fingerprint identification algorithm by clustering similarity
Institute of Scientific and Technical Information of China (English)
TIAN Jie; HE Yuliang; CHEN Hong; YANG Xin
2005-01-01
This paper introduces a fingerprint identification algorithm by clustering similarity with the view to overcome the dilemmas encountered in fingerprint identification.To decrease multi-spectrum noises in a fingerprint, we first use a dyadic scale space (DSS) method for image enhancement. The second step describes the relative features among minutiae by building a minutia-simplex which contains a pair of minutiae and their local associated ridge information, with its transformation-variant and invariant relative features applied for comprehensive similarity measurement and for parameter estimation respectively. The clustering method is employed to estimate the transformation space.Finally, multi-resolution technique is used to find an optimal transformation model for getting the maximal mutual information between the input and the template features. The experimental results including the performance evaluation by the 2nd International Verification Competition in 2002 (FVC2002), over the four fingerprint databases of FVC2002 indicate that our method is promising in an automatic fingerprint identification system (AFIS).
SRAC95; general purpose neutronics code system
Energy Technology Data Exchange (ETDEWEB)
Okumura, Keisuke; Tsuchihashi, Keichiro [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Kaneko, Kunio
1996-03-01
SRAC is a general purpose neutronics code system applicable to core analyses of various types of reactors. Since the publication of JAERI-1302 for the revised SRAC in 1986, a number of additions and modifications have been made for nuclear data libraries and programs. Thus, the new version SRAC95 has been completed. The system consists of six kinds of nuclear data libraries(ENDF/B-IV, -V, -VI, JENDL-2, -3.1, -3.2), five modular codes integrated into SRAC95; collision probability calculation module (PIJ) for 16 types of lattice geometries, Sn transport calculation modules(ANISN, TWOTRAN), diffusion calculation modules(TUD, CITATION) and two optional codes for fuel assembly and core burn-up calculations(newly developed ASMBURN, revised COREBN). In this version, many new functions and data are implemented to support nuclear design studies of advanced reactors, especially for burn-up calculations. SRAC95 is available not only on conventional IBM-compatible computers but also on scalar or vector computers with the UNIX operating system. This report is the SRAC95 users manual which contains general description, contents of revisions, input data requirements, detail information on usage, sample input data and list of available libraries. (author).
General purpose optimization software for engineering design
Vanderplaats, G. N.
1990-01-01
The author has developed several general purpose optimization programs over the past twenty years. The earlier programs were developed as research codes and served that purpose reasonably well. However, in taking the formal step from research to industrial application programs, several important lessons have been learned. Among these are the importance of clear documentation, immediate user support, and consistent maintenance. Most important has been the issue of providing software that gives a good, or at least acceptable, design at minimum computational cost. Here, the basic issues developing optimization software for industrial applications are outlined and issues of convergence rate, reliability, and relative minima are discussed. Considerable feedback has been received from users, and new software is being developed to respond to identified needs. The basic capabilities of this software are outlined. A major motivation for the development of commercial grade software is ease of use and flexibility, and these issues are discussed with reference to general multidisciplinary applications. It is concluded that design productivity can be significantly enhanced by the more widespread use of optimization as an everyday design tool.
Application of hybrid clustering using parallel k-means algorithm and DIANA algorithm
Umam, Khoirul; Bustamam, Alhadi; Lestari, Dian
2017-03-01
DNA is one of the carrier of genetic information of living organisms. Encoding, sequencing, and clustering DNA sequences has become the key jobs and routine in the world of molecular biology, in particular on bioinformatics application. There are two type of clustering, hierarchical clustering and partitioning clustering. In this paper, we combined two type clustering i.e. K-Means (partitioning clustering) and DIANA (hierarchical clustering), therefore it called Hybrid clustering. Application of hybrid clustering using Parallel K-Means algorithm and DIANA algorithm used to clustering DNA sequences of Human Papillomavirus (HPV). The clustering process is started with Collecting DNA sequences of HPV are obtained from NCBI (National Centre for Biotechnology Information), then performing characteristics extraction of DNA sequences. The characteristics extraction result is store in a matrix form, then normalize this matrix using Min-Max normalization and calculate genetic distance using Euclidian Distance. Furthermore, the hybrid clustering is applied by using implementation of Parallel K-Means algorithm and DIANA algorithm. The aim of using Hybrid Clustering is to obtain better clusters result. For validating the resulted clusters, to get optimum number of clusters, we use Davies-Bouldin Index (DBI). In this study, the result of implementation of Parallel K-Means clustering is data clustered become 5 clusters with minimal IDB value is 0.8741, and Hybrid Clustering clustered data become 13 sub-clusters with minimal IDB values = 0.8216, 0.6845, 0.3331, 0.1994 and 0.3952. The IDB value of hybrid clustering less than IBD value of Parallel K-Means clustering only that perform at 1ts stage. Its means clustering using Hybrid Clustering have the better result to clustered DNA sequence of HPV than perform parallel K-Means Clustering only.
Local Community Detection Algorithm Based on Minimal Cluster
Directory of Open Access Journals (Sweden)
Yong Zhou
2016-01-01
Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.
A Load Balance Routing Algorithm Based on Uneven Clustering
Directory of Open Access Journals (Sweden)
Liang Yuan
2013-10-01
Full Text Available Aiming at the problem of uneven load in clustering Wireless Sensor Network (WSN, a kind of load balance routing algorithm based on uneven clustering is proposed to do uneven clustering and calculate optimal number of clustering. This algorithm prevents the number of common node under some certain cluster head from being too large which leads load to be overweight to death through even node clustering. It constructs evaluation function which can better reflect residual energy distribution of nodes and at the same time constructs routing evaluation function between cluster heads which uses MATLAB to do simulation on the performance of this algorithm. Simulation result shows that the routing established by this algorithm effectively improves network’s energy balance and lengthens the life cycle of network.
Analyzing Job Aware Scheduling Algorithm in Hadoop for Heterogeneous Cluster
Directory of Open Access Journals (Sweden)
Mayuri A Mehta
2015-12-01
Full Text Available A scheduling algorithm is required to efficiently manage cluster resources in a Hadoop cluster, thereby to increase resource utilization and to reduce response time. The job aware scheduling algorithm schedules non-local map tasks of jobs based on job execution time, earliest deadline first or workload of the job. In this paper, we present the performance evaluation of the job aware scheduling algorithm using MapReduce WordCount benchmark. The experimental results are compared with matchmaking scheduling algorithm. The results show that the job aware scheduling algorithm reduces average waiting time and memory wastage considerably as compared to matchmaking algorithm.
Study of the Artificial Fish Swarm Algorithm for Hybrid Clustering
Directory of Open Access Journals (Sweden)
Hongwei Zhao
2015-06-01
Full Text Available The basic Artificial Fish Swarm (AFS Algorithm is a new type of an heuristic swarm intelligence algorithm, but it is difficult to optimize to get high precision due to the randomness of the artificial fish behavior, which belongs to the intelligence algorithm. This paper presents an extended AFS algorithm, namely the Cooperative Artificial Fish Swarm (CAFS, which significantly improves the original AFS in solving complex optimization problems. K-medoids clustering algorithm is being used to classify data, but the approach is sensitive to the initial selection of the centers with low quality of the divided cluster. A novel hybrid clustering method based on the CAFS and K-medoids could be used for solving clustering problems. In this work, first, CAFS algorithm is used for optimizing six widely-used benchmark functions, coming up with comparative results produced by AFS and CAFS, then Particle Swarm Optimization (PSO is studied. Second, the hybrid algorithm with K-medoids and CAFS algorithms is used for data clustering on several benchmark data sets. The performance of the hybrid algorithm based on K-medoids and CAFS is compared with AFS and CAFS algorithms on a clustering problem. The simulation results show that the proposed CAFS outperforms the other two algorithms in terms of accuracy and robustness.
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2008-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
Cluster fusion algorithm: application to Lennard-Jones clusters
DEFF Research Database (Denmark)
Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter
2006-01-01
paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...
Simulated annealing spectral clustering algorithm for image segmentation
Institute of Scientific and Technical Information of China (English)
Yifang Yang; and Yuping Wang
2014-01-01
The similarity measure is crucial to the performance of spectral clustering. The Gaussian kernel function based on the Euclidean distance is usual y adopted as the similarity mea-sure. However, the Euclidean distance measure cannot ful y reveal the complex distribution data, and the result of spectral clustering is very sensitive to the scaling parameter. To solve these problems, a new manifold distance measure and a novel simulated anneal-ing spectral clustering (SASC) algorithm based on the manifold distance measure are proposed. The simulated annealing based on genetic algorithm (SAGA), characterized by its rapid conver-gence to the global optimum, is used to cluster the sample points in the spectral mapping space. The proposed algorithm can not only reflect local and global consistency better, but also reduce the sensitivity of spectral clustering to the kernel parameter, which improves the algorithm’s clustering performance. To efficiently ap-ply the algorithm to image segmentation, the Nystr¨om method is used to reduce the computation complexity. Experimental re-sults show that compared with traditional clustering algorithms and those popular spectral clustering algorithms, the proposed algorithm can achieve better clustering performances on several synthetic datasets, texture images and real images.
A Flocking Based algorithm for Document Clustering Analysis
Energy Technology Data Exchange (ETDEWEB)
Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL
2006-01-01
Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
Mercer Kernel Based Fuzzy Clustering Self-Adaptive Algorithm
Institute of Scientific and Technical Information of China (English)
李侃; 刘玉树
2004-01-01
A novel mercer kernel based fuzzy clustering self-adaptive algorithm is presented. The mercer kernel method is introduced to the fuzzy c-means clustering. It may map implicitly the input data into the high-dimensional feature space through the nonlinear transformation. Among other fuzzy c-means and its variants, the number of clusters is first determined. A self-adaptive algorithm is proposed. The number of clusters, which is not given in advance, can be gotten automatically by a validity measure function. Finally, experiments are given to show better performance with the method of kernel based fuzzy c-means self-adaptive algorithm.
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin
2011-01-01
algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...
Android Malware Classification Using K-Means Clustering Algorithm
Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah
2017-08-01
Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.
Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis
Directory of Open Access Journals (Sweden)
S. Muthurajkumar
2014-05-01
Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.
Functional Clustering Algorithm for High-Dimensional Proteomics Data
Directory of Open Access Journals (Sweden)
Halima Bensmail
2005-01-01
Full Text Available Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is largely smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of features of variables (here the number of peaks is needed. An innovative hierarchical clustering algorithm may be a good approach. We propose here a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. We present a specific application of functional data analysis (FDA to a high-throughput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and human T-cell leukemia virus type 1 (HTLV-1-infected patients samples.
A new hybrid imperialist competitive algorithm on data clustering
Indian Academy of Sciences (India)
Taher Niknam; Elahe Taherian Fard; Shervin Ehrampoosh; Alireza Rousta
2011-06-01
Clustering is a process for partitioning datasets. This technique is very useful for optimum solution. -means is one of the simplest and the most famous methods that is based on square error criterion. This algorithm depends on initial states and converges to local optima. Some recent researches show that -means algorithm has been successfully applied to combinatorial optimization problems for clustering. In this paper, we purpose a novel algorithm that is based on combining two algorithms of clustering; -means and Modify Imperialist Competitive Algorithm. It is named hybrid K-MICA. In addition, we use a method called modiﬁed expectation maximization (EM) to determine number of clusters. The experimented results show that the new method carries out better results than the ACO, PSO, Simulated Annealing (SA), Genetic Algorithm (GA), Tabu Search (TS), Honey Bee Mating Optimization (HBMO) and -means.
Extension of K-Modes Algorithm for Generating Clusters Automatically
Directory of Open Access Journals (Sweden)
Anupama Chadha
2016-03-01
Full Text Available —K-Modes is an eminent algorithm for clustering data set with categorical attributes. This algorithm is famous for its simplicity and speed. The KModes is an extension of the K-Means algorithm for categorical data. Since K-Modes is used for categorical data so ‘Simple Matching Dissimilarity’ measure is used instead of Euclidean distance and the ‘Modes’ of clusters are used instead of ‘Means’. However, one major limitation of this algorithm is dependency on prior input of number of clusters K, and sometimes it becomes practically impossible to correctly estimate the optimum number of clusters in advance. In this paper we have proposed an algorithm which will overcome this limitation while maintaining the simplicity of K-Modes algorithm
Resource Allocation in Public Cluster with Extended Optimization Algorithm
Akbar, Z.; Handoko, L. T.
2007-01-01
We introduce an optimization algorithm for resource allocation in the LIPI Public Cluster to optimize its usage according to incoming requests from users. The tool is an extended and modified genetic algorithm developed to match specific natures of public cluster. We present a detail analysis of optimization, and compare the results with the exact calculation. We show that it would be very useful and could realize an automatic decision making system for public clusters.
An ACO Algorithm for Effective Cluster Head Selection
Sampath, Amritha; Thampi, Sabu M; 10.4304/jait.2.1.50-56
2011-01-01
This paper presents an effective algorithm for selecting cluster heads in mobile ad hoc networks using ant colony optimization. A cluster in an ad hoc network consists of a cluster head and cluster members which are at one hop away from the cluster head. The cluster head allocates the resources to its cluster members. Clustering in MANET is done to reduce the communication overhead and thereby increase the network performance. A MANET can have many clusters in it. This paper presents an algorithm which is a combination of the four main clustering schemes- the ID based clustering, connectivity based, probability based and the weighted approach. An Ant colony optimization based approach is used to minimize the number of clusters in MANET. This can also be considered as a minimum dominating set problem in graph theory. The algorithm considers various parameters like the number of nodes, the transmission range etc. Experimental results show that the proposed algorithm is an effective methodology for finding out t...
Squeezer: An Efficient Algorithm for Clustering Categorical Data
Institute of Scientific and Technical Information of China (English)
何增有; 徐晓飞; 邓胜春
2002-01-01
This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deservegood scalability. The Squeezer algorithm reads each tuple t in sequence, either assigning tto an existing cluster (initially none), or creating t as a new cluster, which is determined bythe similarities between t and clusters. Due to its characteristics, the proposed algorithm isextremely suitable for clustering data streams, where given a sequence of points, the objective isto maintain consistently good clustering of the sequence so far, using a small amount of memoryand time. Outliers can also be handled efficiently and directly in Squeezer. Experimental resultson real-life and synthetic datasets verify the superiority of Squeezer.
Using Hyper Clustering Algorithms in Mobile Network Planning
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: As a large amount of data stored in spatial databases, people may like to find groups of data which share similar features. Thus cluster analysis becomes an important area of research in data mining. Applications of clustering analysis have been utilized in many fields, such as when we search to construct a cluster served by base station in mobile network. Deciding upon the optimum placement for the base stations to achieve best services while reducing the cost is a complex task requiring vast computational resource. Approach: This study addresses antenna placement problem or the cell planning problem, involves locating and configuring infrastructure for mobile networks by modified the original density-based Spatial Clustering of Applications with Noise algorithm. The Cluster Partitioning around Medoids original algorithm has been modified and a new algorithm has been proposed by the authors in a recent work. In this study, the density-based Spatial Clustering of Applications with Noise original algorithm has been modified and combined with old algorithm to produce the hybrid algorithm Clustering Density Base and Clustering with Weighted Node-Partitioning around Medoids algorithm to solve the problems in Mobile Network Planning. Results: Implementation of this algorithm to a real case study is presented. Results demonstrate that the proposed algorithm has minimum run time minimum cost and high grade of service. Conclusion: The proposed hyper algorithm has the advantage of quick divide the area into clusters where the density base algorithm has a limit iteration and the advantage of accuracy (no sampling method is used and highly grade of service due to the moving of the location of the base stations (medoid toward the heavy loaded (weighted nodes.
Co-clustering models, algorithms and applications
Govaert, Gérard
2013-01-01
Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture
General-purpose event generators for LHC physics
Buckley, Andy; Gieseke, Stefan; Grellscheid, David; Hoche, Stefan; Hoeth, Hendrik; Krauss, Frank; Lonnblad, Leif; Nurse, Emily; Richardson, Peter; Schumann, Steffen; Seymour, Michael H; Sjostrand, Torbjorn; Skands, Peter; Webber, Bryan
2011-01-01
We review the physics basis, main features and use of general-purpose Monte Carlo event generators for the simulation of proton-proton collisions at the Large Hadron Collider. Topics included are: the generation of hard-scattering matrix elements for processes of interest, at both leading and next-to-leading QCD perturbative order; their matching to approximate treatments of higher orders based on the showering approximation; the parton and dipole shower formulations; parton distribution functions for event generators; non-perturbative aspects such as soft QCD collisions, the underlying event and diffractive processes; the string and cluster models for hadron formation; the treatment of hadron and tau decays; the inclusion of QED radiation and beyond-Standard-Model processes. We describe the principal features of the ARIADNE, Herwig++, PYTHIA 8 and SHERPA generators, together with the Rivet and Professor validation and tuning tools, and discuss the physics philosophy behind the proper use of these generators ...
Constructing Product Ontologies with an Improved Conceptual Clustering Algorithm
Institute of Scientific and Technical Information of China (English)
曹大军; 徐良贤
2002-01-01
In a distributed eMarketplace, recommended product ontologies are required for trading between buyers and sellers. Conceptual clustering can be employed to build dynamic recommended product ontologies. Traditional methods of conceptual clustering (e. g. COBWEB or Cluster/2) do not take heterogeneous attributes of a concept into account.Moreover, the result of these methods is clusters other than recommended concepts. A center recommendation clustering algorithm is provided. According to the values of heterogeneous attributes, recommended product names can be selected at the clusters, which are produced by this algorithm. This algorithm can also create the hierarchical relations between product names. The definitions of product names given by all participants are collected in a distributed eMarketplace.Recommended product ontologies are built. These ontologies include relations and definitions of product names, which come from different participants in the distributed eMarketplace. Finally a case is given to illustrate this method. The result shows that this method is feasible.
Directory of Open Access Journals (Sweden)
Jiang Ting
2010-01-01
Full Text Available We optimize the cluster structure to solve problems such as the uneven energy consumption of the radar sensor nodes and random cluster head selection in the traditional clustering routing algorithm. According to the defined cost function for clusters, we present the clustering algorithm which is based on radio-free space path loss. In addition, we propose the energy and distance pheromones based on the residual energy and aggregation of the radar sensor nodes. According to bionic heuristic algorithm, a new ant colony-based clustering algorithm for radar sensor networks is also proposed. Simulation results show that this algorithm can get a better balance of the energy consumption and then remarkably prolong the lifetime of the radar sensor network.
SNAP: A General Purpose Network Analysis and Graph Mining Library
Leskovec, Jure
2016-01-01
Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy to use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs where nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that ...
47 CFR 32.6124 - General purpose computers expense.
2010-10-01
... 47 Telecommunication 2 2010-10-01 2010-10-01 false General purpose computers expense. 32.6124... General purpose computers expense. This account shall include the costs of personnel whose principal job is the physical operation of general purpose computers and the maintenance of operating systems. This...
Cosine-Based Clustering Algorithm Approach
Directory of Open Access Journals (Sweden)
Mohammed A. H. Lubbad
2012-02-01
Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.
A functional clustering algorithm for the analysis of neural relationships
Feldt, S; Hetrick, V L; Berke, J D; Zochowski, M
2008-01-01
We formulate a novel technique for the detection of functional clusters in neural data. In contrast to prior network clustering algorithms, our procedure progressively combines spike trains and derives the optimal clustering cutoff in a simple and intuitive manner. To demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. We observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Pixel Intensity Clustering Algorithm for Multilevel Image Segmentation
Directory of Open Access Journals (Sweden)
Oludayo O. Olugbara
2015-01-01
Full Text Available Image segmentation is an important problem that has received significant attention in the literature. Over the last few decades, a lot of algorithms were developed to solve image segmentation problem; prominent amongst these are the thresholding algorithms. However, the computational time complexity of thresholding exponentially increases with increasing number of desired thresholds. A wealth of alternative algorithms, notably those based on particle swarm optimization and evolutionary metaheuristics, were proposed to tackle the intrinsic challenges of thresholding. In codicil, clustering based algorithms were developed as multidimensional extensions of thresholding. While these algorithms have demonstrated successful results for fewer thresholds, their computational costs for a large number of thresholds are still a limiting factor. We propose a new clustering algorithm based on linear partitioning of the pixel intensity set and between-cluster variance criterion function for multilevel image segmentation. The results of testing the proposed algorithm on real images from Berkeley Segmentation Dataset and Benchmark show that the algorithm is comparable with state-of-the-art multilevel segmentation algorithms and consistently produces high quality results. The attractive properties of the algorithm are its simplicity, generalization to a large number of clusters, and computational cost effectiveness.
A High-Order CFS Algorithm for Clustering Big Data
Directory of Open Access Journals (Sweden)
Fanyu Bu
2016-01-01
Full Text Available With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learning model, whose functionality rests on three pillars: (i an adaptive dropout deep learning model to learn features from each type of data, (ii a feature tensor model to capture the correlations of heterogeneous data, and (iii a tensor distance-based high-order CFS algorithm to cluster heterogeneous data. Furthermore, we verify our proposed algorithm on different datasets, by comparison with other two clustering schemes, that is, HOPCM and CFS. Results confirm the effectiveness of the proposed algorithm in clustering heterogeneous data.
Meaningful Clustered Forest: an Automatic and Robust Clustering Algorithm
Tepper, Mariano; Almansa, Andrés
2011-01-01
We propose a new clustering method that can be regarded as a numerical method to compute the proximity gestalt. The method analyzes edge length statistics in the MST of the dataset and provides an a contrario cluster detection criterion. The approach is fully parametric on the chosen distance and can detect arbitrarily shaped clusters. The method is also automatic, in the sense that only a single parameter is left to the user. This parameter has an intuitive interpretation as it controls the expected number of false detections. We show that the iterative application of our method can (1) provide robustness to noise and (2) solve a masking phenomenon in which a highly populated and salient cluster dominates the scene and inhibits the detection of less-populated, but still salient, clusters.
The Ordered Clustered Travelling Salesman Problem: A Hybrid Genetic Algorithm
Directory of Open Access Journals (Sweden)
Zakir Hussain Ahmed
2014-01-01
Full Text Available The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
The ordered clustered travelling salesman problem: a hybrid genetic algorithm.
Ahmed, Zakir Hussain
2014-01-01
The ordered clustered travelling salesman problem is a variation of the usual travelling salesman problem in which a set of vertices (except the starting vertex) of the network is divided into some prespecified clusters. The objective is to find the least cost Hamiltonian tour in which vertices of any cluster are visited contiguously and the clusters are visited in the prespecified order. The problem is NP-hard, and it arises in practical transportation and sequencing problems. This paper develops a hybrid genetic algorithm using sequential constructive crossover, 2-opt search, and a local search for obtaining heuristic solution to the problem. The efficiency of the algorithm has been examined against two existing algorithms for some asymmetric and symmetric TSPLIB instances of various sizes. The computational results show that the proposed algorithm is very effective in terms of solution quality and computational time. Finally, we present solution to some more symmetric TSPLIB instances.
The Refinement Algorithm Consideration in Text Clustering Scheme Based on Multilevel Graph
Institute of Scientific and Technical Information of China (English)
CHEN Jian-bin; DONG Xiang-jun; SONG Han-tao
2004-01-01
To construct a high efficient text clustering algorithm, the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed.The model is applied to text clustering.The performance of clustering algorithm has to be improved with the refinement algorithm application.The experiment result demonstrated that the multilevel graph text clustering algorithm is available.
A Scalable Clustering Algorithm in Dense Mobile Sensor Networks
Directory of Open Access Journals (Sweden)
Jianbo Li
2011-03-01
Full Text Available Clustering offers a kind of hierarchical organization to provide scalability and basic performance guarantee by partitioning the network into disjoint groups of nodes. In this paper a scalable and energy efficient clustering algorithm is proposed under dense mobile sensor networks scenario. In the initial cluster formation phase, our proposed scheme features a simple execution process with polynomial time complexity, and eliminates the “frozen time” requirement by introducing some GPS-capable mobile nodes to act as cluster heads. In the following cluster maintenance stage, the maintenance of clusters is asynchronously and event driven so as to thoroughly eliminate the “ripple effect” brought by node mobility. As a result local changes in a cluster need not be seen and updated by the entire network, thus bringing greatly reduced communication overheads and being well suitable for the high mobility environment. Extensive simulations have been conducted and the simulation results reveal that our proposed algorithm successfully achieves its target at incurring much less clustering overheads as well as maintaining much more stable cluster structure, as compared to HCC(High Connectivity Clustering algorithm
Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm
Dong Qin
2014-01-01
Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstab...
The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering
Directory of Open Access Journals (Sweden)
Khalid Jaber
2009-01-01
Full Text Available Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI library and POSIX threads (PThread to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.
A New Method for Medical Image Clustering Using Genetic Algorithm
Directory of Open Access Journals (Sweden)
Akbar Shahrzad Khashandarag
2013-01-01
Full Text Available Segmentation is applied in medical images when the brightness of the images becomes weaker so that making different in recognizing the tissues borders. Thus, the exact segmentation of medical images is an essential process in recognizing and curing an illness. Thus, it is obvious that the purpose of clustering in medical images is the recognition of damaged areas in tissues. Different techniques have been introduced for clustering in different fields such as engineering, medicine, data mining and so on. However, there is no standard technique of clustering to present ideal results for all of the imaging applications. In this paper, a new method combining genetic algorithm and k-means algorithm is presented for clustering medical images. In this combined technique, variable string length genetic algorithm (VGA is used for the determination of the optimal cluster centers. The proposed algorithm has been compared with the k-means clustering algorithm. The advantage of the proposed method is the accuracy in selecting the optimal cluster centers compared with the above mentioned technique.
Centronit: Initial Centroid Designation Algorithm for K-Means Clustering
Directory of Open Access Journals (Sweden)
Ali Ridho Barakbah
2014-06-01
Full Text Available Clustering performance of the K-means highly depends on the correctness of initial centroids. Usually initial centroids for the K- means clustering are determined randomly so that the determined initial centers may cause to reach the nearest local minima, not the global optimum. In this paper, we propose an algorithm, called as Centronit, for designation of initial centroidoptimization of K-means clustering. The proposed algorithm is based on the calculation of the average distance of the nearest data inside region of the minimum distance. The initial centroids can be designated by the lowest average distance of each data. The minimum distance is set by calculating the average distance between the data. This method is also robust from outliers of data. The experimental results show effectiveness of the proposed method to improve the clustering results with the K-means clustering. Keywords: K-means clustering, initial centroids, Kmeansoptimization.
New clustering algorithm for interconnection of MANET and internet
Institute of Scientific and Technical Information of China (English)
万象; 姚尹雄; 王豪行
2004-01-01
This paper presents core-agent based clustering (CBC) algorithm, a novel heuristic clustering scheme for interconnection of MANET and Internet using power, movement probability and hop length as constraints. CBC includes two phases as cluster initialization and cluster maintenance. In phase one, the selection of clusterheads obeys the first two constraints, whereas the father node of each clustering node is chosen according to above three ones. Phase two concerns the case of node insertion or removal. Easy access and little alteration of conventional mobile IP are some characters of this algorithm. Simulation results demonstrate that CBC has many advantages as less average hop length, good robustness and less overheads, and the clustered network architecture behaves stably when topology changes.
The Effective Clustering Partition Algorithm Based on the Genetic Evolution
Institute of Scientific and Technical Information of China (English)
LIAO Qin; LI Xi-wen
2006-01-01
To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.
An Extended Clustering Algorithm for Statistical Language Models
Ueberla, J P
1994-01-01
Statistical language models frequently suffer from a lack of training data. This problem can be alleviated by clustering, because it reduces the number of free parameters that need to be trained. However, clustered models have the following drawback: if there is ``enough'' data to train an unclustered model, then the clustered variant may perform worse. On currently used language modeling corpora, e.g. the Wall Street Journal corpus, how do the performances of a clustered and an unclustered model compare? While trying to address this question, we develop the following two ideas. First, to get a clustering algorithm with potentially high performance, an existing algorithm is extended to deal with higher order N-grams. Second, to make it possible to cluster large amounts of training data more efficiently, a heuristic to speed up the algorithm is presented. The resulting clustering algorithm can be used to cluster trigrams on the Wall Street Journal corpus and the language models it produces can compete with exi...
Foam A General Purpose Cellular Monte Carlo Event Generator
Jadach, Stanislaw
2003-01-01
A general purpose, self-adapting, Monte Carlo (MC) event generator (simulator) is described. The high efficiency of the MC, that is small maximum weight or variance of the MC weight is achieved by means of dividing the integration domain into small cells. The cells can be $n$-dimensional simplices, hyperrectangles or Cartesian product of them. The grid of cells, called ``foam'', is produced in the process of the binary split of the cells. The choice of the next cell to be divided and the position/direction of the division hyper-plane is driven by the algorithm which optimizes the ratio of the maximum weight to the average weight or (optionally) the total variance. The algorithm is able to deal, in principle, with an arbitrary pattern of the singularities in the distribution. As any MC generator, it can also be used for the MC integration. With the typical personal computer CPU, the program is able to perform adaptive integration/simulation at relatively small number of dimensions ($\\leq 16$). With the continu...
Critical dynamics of cluster algorithms in the dilute Ising model
Hennecke, M.; Heyken, U.
1993-08-01
Autocorrelation times for thermodynamic quantities at T C are calculated from Monte Carlo simulations of the site-diluted simple cubic Ising model, using the Swendsen-Wang and Wolff cluster algorithms. Our results show that for these algorithms the autocorrelation times decrease when reducing the concentration of magnetic sites from 100% down to 40%. This is of crucial importance when estimating static properties of the model, since the variances of these estimators increase with autocorrelation time. The dynamical critical exponents are calculated for both algorithms, observing pronounced finite-size effects in the energy autocorrelation data for the algorithm of Wolff. We conclude that, when applied to the dilute Ising model, cluster algorithms become even more effective than local algorithms, for which increasing autocorrelation times are expected.
Segmentation of Medical Image using Clustering and Watershed Algorithms
M. C.J. Christ; R.M.S Parvathi
2011-01-01
Problem statement: Segmentation plays an important role in medical imaging. Segmentation of an image is the division or separation of the image into dissimilar regions of similar attribute. In this study we proposed a methodology that integrates clustering algorithm and marker controlled watershed segmentation algorithm for medical image segmentation. The use of the conservative watershed algorithm for medical image analysis is pervasive because of its advantages, such as always being able to...
Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2006-01-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z = 0.
Efficient cluster algorithm for CP(N-1) models
Beard, B. B.; Pepe, M.; Riederer, S.; Wiese, U.-J.
2006-11-01
Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard Wilson formulation of lattice field theory. In fact, there is a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. In this paper, we construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a regularization for CP(N-1) models in the framework of D-theory. We present detailed studies of the autocorrelations and find a dynamical critical exponent that is consistent with z=0.
Measuring Constraint-Set Utility for Partitional Clustering Algorithms
Davidson, Ian; Wagstaff, Kiri L.; Basu, Sugato
2006-01-01
Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms.
A dynamic fuzzy clustering method based on genetic algorithm
Institute of Scientific and Technical Information of China (English)
ZHENG Yan; ZHOU Chunguang; LIANG Yanchun; GUO Dongwei
2003-01-01
A dynamic fuzzy clustering method is presented based on the genetic algorithm. By calculating the fuzzy dissimilarity between samples the essential associations among samples are modeled factually. The fuzzy dissimilarity between two samples is mapped into their Euclidean distance, that is, the high dimensional samples are mapped into the two-dimensional plane. The mapping is optimized globally by the genetic algorithm, which adjusts the coordinates of each sample, and thus the Euclidean distance, to approximate to the fuzzy dissimilarity between samples gradually. A key advantage of the proposed method is that the clustering is independent of the space distribution of input samples, which improves the flexibility and visualization. This method possesses characteristics of a faster convergence rate and more exact clustering than some typical clustering algorithms. Simulated experiments show the feasibility and availability of the proposed method.
SURVEY ON CLUSTERING ALGORITHM AND SIMILARITY MEASURE FOR CATEGORICAL DATA
Directory of Open Access Journals (Sweden)
S. Anitha Elavarasi
2014-01-01
Full Text Available Learning is the process of generating useful information from a huge volume of data. Learning can be either supervised learning (e.g. classification or unsupervised learning (e.g. Clustering Clustering is the process of grouping a set of physical objects into classes of similar object. Objects in real world consist of both numerical and categorical data. Categorical data are not analyzed as numerical data because of the absence of inherit ordering. This paper describes about ten different clustering algorithms, its methodology and the factors influencing its performance. Each algorithm is evaluated using real world datasets and its pro and cons are specified. The various similarity / dissimilarity measure applied to categorical data and its performance is also discussed. The time complexity defines the amount of time taken by an algorithm to perform the elementary operation. The time complexity of various algorithms are discussed and its performance on real world data such as mushroom, zoo, soya bean, cancer, vote, car and iris are measured. In this survey Cluster Accuracy and Error rate for four different clustering algorithm (K-modes, fuzzy K-modes, ROCK and Squeezer, two different similarity measure (DISC and Overlap and DILCA applied for hierarchy and partition algorithm are evaluated.
A Geometric Clustering Algorithm with Applications to Structural Data
Xu, Shutan; Zou, Shuxue
2015-01-01
Abstract An important feature of structural data, especially those from structural determination and protein-ligand docking programs, is that their distribution could be mostly uniform. Traditional clustering algorithms developed specifically for nonuniformly distributed data may not be adequate for their classification. Here we present a geometric partitional algorithm that could be applied to both uniformly and nonuniformly distributed data. The algorithm is a top-down approach that recursively selects the outliers as the seeds to form new clusters until all the structures within a cluster satisfy a classification criterion. The algorithm has been evaluated on a diverse set of real structural data and six sets of test data. The results show that it is superior to the previous algorithms for the clustering of structural data and is similar to or better than them for the classification of the test data. The algorithm should be especially useful for the identification of the best but minor clusters and for speeding up an iterative process widely used in NMR structure determination. PMID:25517067
Research on retailer data clustering algorithm based on Spark
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Big Data Clustering Using Genetic Algorithm On Hadoop Mapreduce
Directory of Open Access Journals (Sweden)
Nivranshu Hans
2015-04-01
Full Text Available Abstract Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Mapreduce is a popular programming model which enables parallel processing in a distributed environment. But most of the clustering algorithms are not naturally parallelizable for instance Genetic Algorithms. This is so due to the sequential nature of Genetic Algorithms. This paper introduces a technique to parallelize GA based clustering by extending hadoop mapreduce. An analysis of proposed approach to evaluate performance gains with respect to a sequential algorithm is presented. The analysis is based on a real life large data set.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
An improved algorithm for clustering gene expression data.
Bandyopadhyay, Sanghamitra; Mukhopadhyay, Anirban; Maulik, Ujjwal
2007-11-01
Recent advancements in microarray technology allows simultaneous monitoring of the expression levels of a large number of genes over different time points. Clustering is an important tool for analyzing such microarray data, typical properties of which are its inherent uncertainty, noise and imprecision. In this article, a two-stage clustering algorithm, which employs a recently proposed variable string length genetic scheme and a multiobjective genetic clustering algorithm, is proposed. It is based on the novel concept of points having significant membership to multiple classes. An iterated version of the well-known Fuzzy C-Means is also utilized for clustering. The significant superiority of the proposed two-stage clustering algorithm as compared to the average linkage method, Self Organizing Map (SOM) and a recently developed weighted Chinese restaurant-based clustering method (CRC), widely used methods for clustering gene expression data, is established on a variety of artificial and publicly available real life data sets. The biological relevance of the clustering solutions are also analyzed.
Improved insensitive to input parameters trajectory clustering algorithm
Institute of Scientific and Technical Information of China (English)
Jiashun Chen; Dechang Pi
2013-01-01
The existing trajectory clustering (TRACLUS) is sensi-tive to the input parameters ε and MinLns. The parameter value is changed a little, but cluster results are entirely different. Aiming at this vulnerability, a shielding parameters sensitivity trajectory cluster (SPSTC) algorithm is proposed which is insensitive to the input parameters. Firstly, some definitions about the core distance and reachable distance of line segment are presented, and then the algorithm generates cluster sorting according to the core dis-tance and reachable distance. Secondly, the reachable plots of line segment sets are constructed according to the cluster sor-ting and reachable distance. Thirdly, a parameterized sequence is extracted according to the reachable plot, and then the final trajec-tory cluster based on the parameterized sequence is acquired. The parameterized sequence represents the inner cluster structure of trajectory data. Experiments on real data sets and test data sets show that the SPSTC algorithm effectively reduces the sensitivity to the input parameters, meanwhile it can obtain the better quality of the trajectory cluster.
Multilayer Traffic Network Optimized by Multiobjective Genetic Clustering Algorithm
Wen, Feng; Gen, Mitsuo; Yu, Xinjie
This paper introduces a multilayer traffic network model and traffic network clustering method for solving the route selection problem (RSP) in car navigation system (CNS). The purpose of the proposed method is to reduce the computation time of route selection substantially with acceptable loss of accuracy by preprocessing the large size traffic network into new network form. The proposed approach further preprocesses the traffic network than the traditional hierarchical network method by clustering method. The traffic network clustering considers two criteria. We specify a genetic clustering algorithm for traffic network clustering and use NSGA-II for calculating the multiple objective Pareto optimal set. The proposed method can overcome the size limitations when solving route selection in CNS. Solutions provided by the proposed algorithm are compared with the optimal solutions to analyze and quantify the loss of accuracy.
Morphology of Open Clusters NGC 1857 and Czernik 20 using Clustering Algorithms
Bhattacharya, Souradeep; Pandaokar, Samay; Singh, Parikshit Kishor
2016-01-01
The morphology and cluster membership of the Galactic open clusters - Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Emmons, Scott; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Blondel, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 o...
Sampling Within k-Means Algorithm to Cluster Large Datasets
Energy Technology Data Exchange (ETDEWEB)
Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL
2011-08-01
Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.
GDCluster: A General Decentralized Clustering Algorithm
Mashayekhi, Hoda; Habibi, Jafar; Khalafbeigi, Tania; Voulgaris, Spyros; van Steen, Martinus Richardus
In many popular applications like peer-to-peer systems, large amounts of data are distributed among multiple sources. Analysis of this data and identifying clusters is challenging due to processing, storage, and transmission costs. In this paper, we propose GDCluster, a general fully decentralized
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering
Chahine, Firas Safwan
2012-01-01
Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Effective FCM noise clustering algorithms in medical images.
Kannan, S R; Devi, R; Ramathilagam, S; Takezawa, K
2013-02-01
The main motivation of this paper is to introduce a class of robust non-Euclidean distance measures for the original data space to derive new objective function and thus clustering the non-Euclidean structures in data to enhance the robustness of the original clustering algorithms to reduce noise and outliers. The new objective functions of proposed algorithms are realized by incorporating the noise clustering concept into the entropy based fuzzy C-means algorithm with suitable noise distance which is employed to take the information about noisy data in the clustering process. This paper presents initial cluster prototypes using prototype initialization method, so that this work tries to obtain the final result with less number of iterations. To evaluate the performance of the proposed methods in reducing the noise level, experimental work has been carried out with a synthetic image which is corrupted by Gaussian noise. The superiority of the proposed methods has been examined through the experimental study on medical images. The experimental results show that the proposed algorithms perform significantly better than the standard existing algorithms. The accurate classification percentage of the proposed fuzzy C-means segmentation method is obtained using silhouette validity index.
General-purpose event generators for LHC physics
Energy Technology Data Exchange (ETDEWEB)
Buckley, Andy [PPE Group, School of Physics and Astronomy, University of Edinburgh, EH25 9PN (United Kingdom); Butterworth, Jonathan [Department of Physics and Astronomy, University College London, WC1E 6BT (United Kingdom); Gieseke, Stefan [Institute for Theoretical Physics, Karlsruhe Institute of Technology, D-76128 Karlsruhe (Germany); Grellscheid, David [Institute for Particle Physics Phenomenology, Durham University, DH1 3LE (United Kingdom); Hoeche, Stefan [SLAC National Accelerator Laboratory, Menlo Park, CA 94025 (United States); Hoeth, Hendrik; Krauss, Frank [Institute for Particle Physics Phenomenology, Durham University, DH1 3LE (United Kingdom); Loennblad, Leif [Department of Astronomy and Theoretical Physics, Lund University (Sweden); PH Department, TH Unit, CERN, CH-1211 Geneva 23 (Switzerland); Nurse, Emily [Department of Physics and Astronomy, University College London, WC1E 6BT (United Kingdom); Richardson, Peter [Institute for Particle Physics Phenomenology, Durham University, DH1 3LE (United Kingdom); Schumann, Steffen [Institute for Theoretical Physics, University of Heidelberg, 69120 Heidelberg (Germany); Seymour, Michael H. [School of Physics and Astronomy, University of Manchester, M13 9PL (United Kingdom); Sjoestrand, Torbjoern [Department of Astronomy and Theoretical Physics, Lund University (Sweden); Skands, Peter [PH Department, TH Unit, CERN, CH-1211 Geneva 23 (Switzerland); Webber, Bryan, E-mail: webber@hep.phy.cam.ac.uk [Cavendish Laboratory, J.J. Thomson Avenue, Cambridge CB3 0HE (United Kingdom)
2011-07-15
We review the physics basis, main features and use of general-purpose Monte Carlo event generators for the simulation of proton-proton collisions at the Large Hadron Collider. Topics included are: the generation of hard scattering matrix elements for processes of interest, at both leading and next-to-leading QCD perturbative order; their matching to approximate treatments of higher orders based on the showering approximation; the parton and dipole shower formulations; parton distribution functions for event generators; non-perturbative aspects such as soft QCD collisions, the underlying event and diffractive processes; the string and cluster models for hadron formation; the treatment of hadron and tau decays; the inclusion of QED radiation and beyond Standard Model processes. We describe the principal features of the ARIADNE, Herwig++, PYTHIA 8 and SHERPA generators, together with the Rivet and Professor validation and tuning tools, and discuss the physics philosophy behind the proper use of these generators and tools. This review is aimed at phenomenologists wishing to understand better how parton-level predictions are translated into hadron-level events as well as experimentalists seeking a deeper insight into the tools available for signal and background simulation at the LHC.
General-purpose event generators for LHC physics
Energy Technology Data Exchange (ETDEWEB)
Buckley, Andy; /Edinburgh U.; Butterworth, Jonathan; /University Coll. London; Gieseke, Stefan; /Karlsruhe U., ITP; Grellscheid, David; /Durham U., IPPP; Hoche, Stefan; /SLAC; Hoeth, Hendrik; Krauss, Frank; /Durham U., IPPP; Lonnblad, Leif; /Lund U., Dept. Theor. Phys. /CERN; Nurse, Emily; /University Coll. London; Richardson, Peter; /Durham U., IPPP; Schumann, Steffen; /Heidelberg U.; Seymour, Michael H.; /Manchester U.; Sjostrand, Torbjorn; /Lund U., Dept. Theor. Phys.; Skands, Peter; /CERN; Webber, Bryan; /Cambridge U.
2011-03-03
We review the physics basis, main features and use of general-purpose Monte Carlo event generators for the simulation of proton-proton collisions at the Large Hadron Collider. Topics included are: the generation of hard-scattering matrix elements for processes of interest, at both leading and next-to-leading QCD perturbative order; their matching to approximate treatments of higher orders based on the showering approximation; the parton and dipole shower formulations; parton distribution functions for event generators; non-perturbative aspects such as soft QCD collisions, the underlying event and diffractive processes; the string and cluster models for hadron formation; the treatment of hadron and tau decays; the inclusion of QED radiation and beyond-Standard-Model processes. We describe the principal features of the Ariadne, Herwig++, Pythia 8 and Sherpa generators, together with the Rivet and Professor validation and tuning tools, and discuss the physics philosophy behind the proper use of these generators and tools. This review is aimed at phenomenologists wishing to understand better how parton-level predictions are translated into hadron-level events as well as experimentalists wanting a deeper insight into the tools available for signal and background simulation at the LHC.
Robustness of the ATLAS pixel clustering neural network algorithm
AUTHOR|(INSPIRE)INSPIRE-00407780; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. In the ATLAS track reconstruction algorithm, an artificial neural network is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The robustness of the neural network algorithm is presented, probing its sensitivity to uncertainties in the detector conditions. The robustness is studied by evaluating the stability of the algorithm's performance under a range of variations in the inputs to the neural networks. Within reasonable variation magnitudes, the neural networks prove to be robust to most variation types.
47 CFR 32.2124 - General purpose computers.
2010-10-01
... 47 Telecommunication 2 2010-10-01 2010-10-01 false General purpose computers. 32.2124 Section 32... General purpose computers. (a) This account shall include the original cost of computers and peripheral... cost of computers and their associated peripheral devices associated with switching, network signaling...
78 FR 7718 - Review of the General Purpose Costing System
2013-02-04
... Surface Transportation Board 49 CFR Parts 1247 and 1248 Review of the General Purpose Costing System... general purpose costing system, the Uniform Railroad Costing System (URCS). Specifically, the Board is..., 2013. ADDRESSES: Comments may be submitted either via the Board's e-filing format or in the traditional...
World Wide Web Metasearch Clustering Algorithm
Directory of Open Access Journals (Sweden)
Adina LIPAI
2008-01-01
Full Text Available As the storage capacity and the processing speed of search engine is growing to keep up with the constant expansion of the World Wide Web, the user is facing an increasing list of results for a given query. A simple query composed of common words sometimes have hundreds even thousands of results making it practically impossible for the user to verify all of them, in order to identify a particular site. Even when the list of results is presented to the user ordered by a rank, most of the time it is not sufficient support to help him identify the most relevant sites for his query. The concept of search result clustering was introduced as a solution to this situation. The process of clustering search results consists of building up thematically homogenous groups from the initial list results provided by classic search tools, and using up characteristics present within the initial results, without any kind of predefined categories.
Efficient Clustering of Web Search Results Using Enhanced Lingo Algorithm
Directory of Open Access Journals (Sweden)
M. Manikantan
2015-02-01
Full Text Available Web query optimization is the focus of recent research and development efforts. To fetch the required information, the users are using search engines and sometimes through the website interfaces. One approach is search engine optimization which is used by the website developers to popularize their website through the search engine results. Clustering is a main task of explorative data mining process and a common technique for grouping the web search results into a different category based on the specific web contents. A clustering search engine called Lingo used only snippets to cluster the documents. Though this method takes less time to cluster the documents, it could not be able to produce the clusters of good quality. This study focuses on clustering all documents using by applying semantic similarity between words and then by applying modified lingo algorithm in less time and produce good quality.
A Novel Hybrid Data Clustering Algorithm Based on Artificial Bee Colony Algorithm and K-Means
Institute of Scientific and Technical Information of China (English)
TRAN Dang Cong; WU Zhijian; WANG Zelin; DENG Changshou
2015-01-01
To improve the performance of K-means clustering algorithm, this paper presents a new hybrid ap-proach of Enhanced artificial bee colony algorithm and K-means (EABCK). In EABCK, the original artificial bee colony algorithm (called ABC) is enhanced by a new mu-tation operation and guided by the global best solution (called EABC). Then, the best solution is updated by K-means in each iteration for data clustering. In the experi-ments, a set of benchmark functions was used to evaluate the performance of EABC with other comparative ABC variants. To evaluate the performance of EABCK on data clustering, eleven benchmark datasets were utilized. The experimental results show that EABC and EABCK out-perform other comparative ABC variants and data clus-tering algorithms, respectively.
AN IMPROVED FUZZY CLUSTERING ALGORITHM FOR MICROARRAY IMAGE SPOTS SEGMENTATION
Directory of Open Access Journals (Sweden)
V.G. Biju
2015-11-01
Full Text Available An automatic cDNA microarray image processing using an improved fuzzy clustering algorithm is presented in this paper. The spot segmentation algorithm proposed uses the gridding technique developed by the authors earlier, for finding the co-ordinates of each spot in an image. Automatic cropping of spots from microarray image is done using these co-ordinates. The present paper proposes an improved fuzzy clustering algorithm Possibility fuzzy local information c means (PFLICM to segment the spot foreground (FG from background (BG. The PFLICM improves fuzzy local information c means (FLICM algorithm by incorporating typicality of a pixel along with gray level information and local spatial information. The performance of the algorithm is validated using a set of simulated cDNA microarray images added with different levels of AWGN noise. The strength of the algorithm is tested by computing the parameters such as the Segmentation matching factor (SMF, Probability of error (pe, Discrepancy distance (D and Normal mean square error (NMSE. SMF value obtained for PFLICM algorithm shows an improvement of 0.9 % and 0.7 % for high noise and low noise microarray images respectively compared to FLICM algorithm. The PFLICM algorithm is also applied on real microarray images and gene expression values are computed.
Functional clustering algorithm for the analysis of dynamic network data
Feldt, S.; Waddell, J.; Hetrick, V. L.; Berke, J. D.; Żochowski, M.
2009-05-01
We formulate a technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines data traces and derives the optimal clustering cutoff in a simple and intuitive manner through the use of surrogate data sets. In order to demonstrate the power of this algorithm to detect changes in network dynamics and connectivity, we apply it to both simulated neural spike train data and real neural data obtained from the mouse hippocampus during exploration and slow-wave sleep. Using the simulated data, we show that our algorithm performs better than existing methods. In the experimental data, we observe state-dependent clustering patterns consistent with known neurophysiological processes involved in memory consolidation.
Application of genetic algorithms to hydrogenated silicon clusters
Indian Academy of Sciences (India)
N Chakraborti; R Prasad
2003-01-01
We discuss the application of biologically inspired genetic algorithms to determine the ground state structures of a number of Si–H clusters. The total energy of a given configuration of a cluster has been obtained by using a non-orthogonal tight-binding model and the energy minimization has been carried out by using genetic algorithms and their recent variant differential evolution. Our results for ground state structures and cohesive energies for Si–H clusters are in good agreement with the earlier work conducted using the simulated annealing technique. We find that the results obtained by genetic algorithms turn out to be comparable and often better than the results obtained by the simulated annealing technique.
Spin chain simulations with a meron cluster algorithm
Energy Technology Data Exchange (ETDEWEB)
Boyer, T. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Ecole Normale Superieure de Cachan (France); Bietenholz, W. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Deutsches Elektronen-Synchrotron (DESY), Zeuthen (Germany). John von Neumann-Inst. fuer Computing NIC; Wuilloud, J. [Humboldt-Universitaet, Berlin (Germany). Inst. fuer Physik]|[Geneve Univ. (Switzerland). Dept. de Physique Theorique
2007-01-15
We apply a meron cluster algorithm to the XY spin chain, which describes a quantum rotor. This is a multi-cluster simulation supplemented by an improved estimator, which deals with objects of half-integer topological charge. This method is powerful enough to provide precise results for the model with a {theta}-term - it is therefore one of the rare examples, where a system with a complex action can be solved numerically. In particular we measure the correlation length, as well as the topological and magnetic susceptibility. We discuss the algorithmic efficiency in view of the critical slowing down. Due to the excellent performance that we observe, it is strongly motivated to work on new applications of meron cluster algorithms in higher dimensions. (orig.)
Adaptive Weighted Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Adwan Yasin
2016-04-01
Full Text Available In this paper we present a new algorithm for clustering MANET by considering several parameters. This is a new adaptive load balancing technique for clustering out Mobile Ad-hoc Networks (MANET. MANET is special kind of wireless networks where no central management exits and the nodes in the network cooperatively manage itself and maintains connectivity. The algorithm takes into account the local capabilities of each node, the remaining battery power, degree of connectivity and finally the power consumption based on the average distance between nodes and candidate cluster head. The proposed algorithm efficiently decreases the overhead in the network that enhances the overall MANET performance. Reducing the maintenance time of broken routes makes the network more stable, reliable. Saving the power of the nodes also guarantee consistent and reliable network.
SNAP: A General Purpose Network Analysis and Graph Mining Library.
Leskovec, Jure; Sosič, Rok
2016-10-01
Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy to use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs where nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and meta-data on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic, they can be modified during the computation at low cost. SNAP is provided as an open source library in C++ as well as a module in Python. We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.
A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis
Directory of Open Access Journals (Sweden)
Shaoning Li
2017-01-01
Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.
Energy Efficient Homogenous Clustering and Cluster Head Selection Algorithm for WSN
Directory of Open Access Journals (Sweden)
Ganeshayya I. Shidaganti
2013-02-01
Full Text Available Wireless sensor networks (WSNs are energy and resource constrained networks, which are made up of small electronic devices called sensor nodes. Each sensor nodes are capable of sensing, computing and transmitting data from one node to another, till to reach base station. Each node monitors physical or environmental conditions, depending on application and communicate with nearby nodes via radio broadcast. Radio transmission and reception consumes a lot of energy in a wireless sensor network (WSN, thus, one of the important issues in wireless sensor network is the inherent limited battery power within the sensor nodes. Therefore, battery power is crucial parameter in the algorithm design in maximizing the lifespan of sensor nodes. Much research has been done in recent years in the area of low power routing protocol, but there are still many design options open for improvement and for further research targeted to the specific applications need to be done. In this paper, we propose a new approach of an energy-efficient homogeneous clustering and cluster head selection algorithm for wireless sensor networks in which the lifespan of the network is increased by ensuring a homogeneous distribution of nodes in the clusters. In this clustering algorithm, energy efficiency is distributed and network performance is improved by selecting cluster heads on the basis of the residual energy of existing cluster heads, holdback value, and nearest hop distance of the node. In the proposed clustering algorithm, the cluster members are uniformly distributed and the life of the network is further extended
JETSET: Physics at LEAR with an Internal Gas Jet Target and an Advanced General Purpose Detector
2002-01-01
This experiment involves an internal gas cluster jet target surrounded by a compact general-purpose detector. The LEAR beam and internal jet target provide several important experimental features: high luminosity $ ( 10 ^{3} ^0 $ cm$^- ^{2} $ sec$^- ^{1} ) $, excellent mass resolution ($\\Delta
NCUBE - A clustering algorithm based on a discretized data space
Eigen, D. J.; Northouse, R. A.
1974-01-01
Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.
A Rough Set based Gene Expression Clustering Algorithm
Directory of Open Access Journals (Sweden)
J. J. Emilyn
2011-01-01
Full Text Available Problem statement: Microarray technology helps in monitoring the expression levels of thousands of genes across collections of related samples. Approach: The main goal in the analysis of large and heterogeneous gene expression datasets was to identify groups of genes that get expressed in a set of experimental conditions. Results: Several clustering techniques have been proposed for identifying gene signatures and to understand their role and many of them have been applied to gene expression data, but with partial success. The main aim of this work was to develop a clustering algorithm that would successfully indentify gene patterns. The proposed novel clustering technique (RCGED provides an efficient way of finding the hidden and unique gene expression patterns. It overcomes the restriction of one object being placed in only one cluster. Conclusion/Recommendations: The proposed algorithm is termed intelligent because it automatically determines the optimum number of clusters. The proposed algorithm was experimented with colon cancer dataset and the results were compared with Rough Fuzzy K Means algorithm.
Core Business Selection Based on Ant Colony Clustering Algorithm
Directory of Open Access Journals (Sweden)
Yu Lan
2014-01-01
Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.
Research on Scheduling Algorithms in Web Cluster Servers
Institute of Scientific and Technical Information of China (English)
LEI YingChun (雷迎春); GONG YiLi (龚奕利); ZHANG Song (张松); LI GuoJie (李国杰)
2003-01-01
This paper analyzes quantitatively the impact of the load balance scheduling algorithms and the locality scheduling algorithms on the performance of Web cluster servers, and brings forward the Adaptive_LARD algorithm. Compared with the representative LARD algorithm, the advantages of the Adaptive_LARD are that: (1) it adjusts load distribution among the back-ends through the idea of load balancing to avoid learning steps in the LARD algorithm and reinforce its adaptability; (2) by distinguishing between TCP connections accessing disks and those accessing cache memory, it can estimate the impact of different connections on the back-ends' load more precisely. Performance evaluations suggest that the proposed method outperforms the LARD algorithm by up to 14.7%.
Identifying multiple influential spreaders by a heuristic clustering algorithm
Energy Technology Data Exchange (ETDEWEB)
Bao, Zhong-Kui [School of Mathematical Science, Anhui University, Hefei 230601 (China); Liu, Jian-Guo [Data Science and Cloud Service Research Center, Shanghai University of Finance and Economics, Shanghai, 200133 (China); Zhang, Hai-Feng, E-mail: haifengzhang1978@gmail.com [School of Mathematical Science, Anhui University, Hefei 230601 (China); Department of Communication Engineering, North University of China, Taiyuan, Shan' xi 030051 (China)
2017-03-18
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.
Limited Random Walk Algorithm for Big Graph Data Clustering
Zhang, Honglei; Kiranyaz, Serkan; Gabbouj, Moncef
2016-01-01
Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attracting vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrass...
A Genetic Clustering Algorithm for Mean-Residual Vector Quantization
Institute of Scientific and Technical Information of China (English)
CHUShuchuan; JohnF.Roddick; CHENTsongyi
2004-01-01
Vector quantization (VQ) is a useful tool for data compression and can be applied to compress the data vectors in the database. The quality of the recovered data vector depends on a good codebook. Meanresidual vector quantization (M/R VQ) has been shown to be efficient in the encoding time and it only needs a little storage. In this paper, genetic algorithms in combination with the Generalized lloyd algorithm (GLA) are applied to the codebook design of M/R VQ. The mean codebook and residual codebook are trained using GLA algorithm separately, then Genetic algorithms (GA) are used to evaluate and evolve the combined mean codebook and residual codebook. The parameters used in the proposed algorithm are designed based on experiments and they are robust to the proposed GA based clustering algorithm for M/R VQ. Experimental results demonstrate the proposed genetic clustering algorithm applied to M/R VQ may improve the peak signal to noise ratio of the recovered data vector compared with the GLA algorithm.
A Task-parallel Clustering Algorithm for Structured AMR
Energy Technology Data Exchange (ETDEWEB)
Gunney, B N; Wissink, A M
2004-11-02
A new parallel algorithm, based on the Berger-Rigoutsos algorithm for clustering grid points into logically rectangular regions, is presented. The clustering operation is frequently performed in the dynamic gridding steps of structured adaptive mesh refinement (SAMR) calculations. A previous study revealed that although the cost of clustering is generally insignificant for smaller problems run on relatively few processors, the algorithm scaled inefficiently in parallel and its cost grows with problem size. Hence, it can become significant for large scale problems run on very large parallel machines, such as the new BlueGene system (which has {Omicron}(10{sup 4}) processors). We propose a new task-parallel algorithm designed to reduce communication wait times. Performance was assessed using dynamic SAMR re-gridding operations on up to 16K processors of currently available computers at Lawrence Livermore National Laboratory. The new algorithm was shown to be up to an order of magnitude faster than the baseline algorithm and had better scaling trends.
A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.
Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing
2017-01-01
An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.
Dynamic Head Cluster Election Algorithm for Clustered Ad-Hoc Networks
Directory of Open Access Journals (Sweden)
Arwa Zabian
2008-01-01
Full Text Available In distributed system, the concept of clustering consists on dividing the geographical area covered by a set of nodes into small zones. In mobile network, the clustering mechanism varied due to the mobility of the nodes any time in any direction. That causes the partitioning of the network or the joining of nodes. Several existing centralized or globalized algorithm have been proposed for clustering technique, in a manner that no one node becomes isolated and no cluster becomes overloaded. A particular node called head cluster or leader is elected, has the role to organize the distribution of nodes in clusters. We propose a distributed clustering and leader election mechanism for Ad-Hoc mobile networks, in which the leader is a mobile node. Our results show that, in the case of leader mobility the time needed to elect a new leader is smaller than the time needed a significant topological change in the network is happens.
Clustered Self Organising Migrating Algorithm for the Quadratic Assignment Problem
Davendra, Donald; Zelinka, Ivan; Senkerik, Roman
2009-08-01
An approach of population dynamics and clustering for permutative problems is presented in this paper. Diversity indicators are created from solution ordering and its mapping is shown as an advantage for population control in metaheuristics. Self Organising Migrating Algorithm (SOMA) is modified using this approach and vetted with the Quadratic Assignment Problem (QAP). Extensive experimentation is conducted on benchmark problems in this area.
Blockspin Scheme and Cluster Algorithm for Quantum Spin Systems
Ying, H P; Ying, He-Ping; Wiese, Uwe-Jens
1992-01-01
We present a numerical study using a cluster algorithm for the 1-d $S=1/2$ quantum Heisenberg models. The dynamical critical exponent for anti-ferromagnetic chains is $z=0.0(1)$ such that critical slowing down is eliminated.
Clustering algorithms for Stokes space modulation format recognition
DEFF Research Database (Denmark)
Boada, Ricard; Borkowski, Robert; Tafur Monroy, Idelfonso
2015-01-01
Stokes space modulation format recognition (Stokes MFR) is a blind method enabling digital coherent receivers to infer modulation format information directly from a received polarization-division-multiplexed signal. A crucial part of the Stokes MFR is a clustering algorithm, which largely...
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.
Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy
2016-01-01
Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey
Energy Technology Data Exchange (ETDEWEB)
Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,
2005-03-01
We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster
A Survey on Clustering Algorithms for Heterogeneous Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Vivek Katiyar
2011-01-01
Full Text Available Potential use of wireless sensor networks (WSNs can be seen in various fields like disaster management, battle field surveillance and border security surveillance since last few years. In such applications, a large number of sensor nodes are deployed, which are often unattended and work autonomously. Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. It can also increase network scalability. Sensor nodes are considered to be homogeneous since the researches in the field of WSNs have been evolved, but some nodes may be of different energy to prolong the lifetime of a WSN and its reliability. In this paper, we study the impact of heterogeneity of nodes to the performance of WSNs. This paper surveys different clustering algorithms for heterogeneous WSNs by classifying algorithms depending upon various clustering attributes.
A HYBRID HEURISTIC ALGORITHM FOR THE CLUSTERED TRAVELING SALESMAN PROBLEM
Directory of Open Access Journals (Sweden)
Mário Mestria
2016-04-01
Full Text Available ABSTRACT This paper proposes a hybrid heuristic algorithm, based on the metaheuristics Greedy Randomized Adaptive Search Procedure, Iterated Local Search and Variable Neighborhood Descent, to solve the Clustered Traveling Salesman Problem (CTSP. Hybrid Heuristic algorithm uses several variable neighborhood structures combining the intensification (using local search operators and diversification (constructive heuristic and perturbation routine. In the CTSP, the vertices are partitioned into clusters and all vertices of each cluster have to be visited contiguously. The CTSP is -hard since it includes the well-known Traveling Salesman Problem (TSP as a special case. Our hybrid heuristic is compared with three heuristics from the literature and an exact method. Computational experiments are reported for different classes of instances. Experimental results show that the proposed hybrid heuristic obtains competitive results within reasonable computational time.
An Efficient Cluster Algorithm for CP(N-1) Models
Beard, B B; Riederer, S; Wiese, U J
2005-01-01
We construct an efficient cluster algorithm for ferromagnetic SU(N)-symmetric quantum spin systems. Such systems provide a new regularization for CP(N-1) models in the framework of D-theory, which is an alternative non-perturbative approach to quantum field theory formulated in terms of discrete quantum variables instead of classical fields. Despite several attempts, no efficient cluster algorithm has been constructed for CP(N-1) models in the standard formulation of lattice field theory. In fact, there is even a no-go theorem that prevents the construction of an efficient Wolff-type embedding algorithm. We present various simulations for different correlation lengths, couplings and lattice sizes. We have simulated correlation lengths up to 250 lattice spacings on lattices as large as 640x640 and we detect no evidence for critical slowing down.
Morphology of open clusters NGC 1857 and Czernik 20 using clustering algorithms
Bhattacharya, S.; Mahulkar, V.; Pandaokar, S.; Singh, P. K.
2017-01-01
The morphology and cluster membership of the Galactic open clusters-Czernik 20 and NGC 1857 were analyzed using two different clustering algorithms. We present the maiden use of density-based spatial clustering of applications with noise (DBSCAN) to determine open cluster morphology from spatial distribution. The region of analysis has also been spatially classified using a statistical membership determination algorithm. We utilized near infrared (NIR) data for a suitably large region around the clusters from the United Kingdom Infrared Deep Sky Survey Galactic Plane Survey star catalogue database, and also from the Two Micron All Sky Survey star catalogue database. The densest regions of the cluster morphologies (1 for Czernik 20 and 2 for NGC 1857) thus identified were analyzed with a K-band extinction map and color-magnitude diagrams (CMDs). To address significant discrepancy in known distance and reddening parameters, we carried out field decontamination of these CMDs and subsequent isochrone fitting of the cleaned CMDs to obtain reliable distance and reddening parameters for the clusters (Czernik 20: D = 2900 pc; E(J- K) = 0 . 33; NGC 1857: D = 2400 pc; E(J- K) =0.18-0.19). The isochrones were also used to convert the luminosity functions for the densest regions of Czernik 20 and NGC 1857 into mass function, to derive their slopes. Additionally, a previously unknown over-density consistent with that of a star cluster is identified in the region of analysis.
Evaluation of clustering algorithms for protein-protein interaction networks
Directory of Open Access Journals (Sweden)
van Helden Jacques
2006-11-01
Full Text Available Abstract Background Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism. In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies. High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL, Restricted Neighborhood Search Clustering (RNSC, Super Paramagnetic Clustering (SPC, and Molecular Complex Detection (MCODE. Results A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. Conclusion This
A heuristic approach to possibilistic clustering algorithms and applications
Viattchenin, Dmitri A
2013-01-01
The present book outlines a new approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy cluster and the possibilistic memberships are determined directly from the values of the pairwise similarity of objects. The proposed approach can be used for solving different classification problems. Here, some techniques that might be useful at this purpose are outlined, including a methodology for constructing a set of labeled objects for a semi-supervised clustering algorithm, a methodology for reducing analyzed attribute space dimensionality and a methods for asymmetric data processing. Moreover, a technique for constructing a subset of the most appropriate alternatives for a set of weak fuzzy preference relations, which are defined on a universe of alternatives, is described in detail, and a method for rapidly prototyping the Mamdani’s fuzzy inference systems is introduced. This book addresses engineers, scientist...
A comparison of clustering algorithms in article recommendation system
Tantanasiriwong, Supaporn
2012-01-01
Recommendation system is considered a tool that can be used to recommend researchers about resources that are suitable for their research of interest by using content-based filtering. In this paper, clustering algorithm as an unsupervised learning is introduced for grouping objects based on their feature selection and similarities. The information of publication in Science Cited Index is used to be dataset for clustering as a feature extraction in terms of dimensionality reduction of these articles by comparing Latent Dirichlet Allocation (LDA), Principal Component Analysis (PCA), and K-Mean to determine the best algorithm. In my experiment, the selected database consists of 2625 documents extraction extracted from SCI corpus from 2001 to 2009. Clustering into ranks as 50,100,200,250 is used to consider and using F-Measure evaluate among them in three algorithms. The result of this paper showed that LDA technique given the accuracy up to 95.5% which is the highest effective than any other clustering technique.
A Clustering Genetic Algorithm for Cylinder Drag Optimization
Milano, Michele; Koumoutsakos, Petros
2002-01-01
A real coded genetic algorithm is implemented for the optimization of actuator parameters for cylinder drag minimization. We consider two types of idealized actuators that are allowed either to move steadily and tangentially to the cylinder surface (“belts”) or to steadily blow/suck with a zero net mass constraint. The genetic algorithm we implement has the property of identifying minima basins, rather than single optimum points. The knowledge of the shape of the minimum basin enables further insights into the system properties and provides a sensitivity analysis in a fully automated way. The drag minimization problem is formulated as an optimal regulation problem. By means of the clustering property of the present genetic algorithm, a set of solutions producing drag reduction of up to 50% is identified. A comparison between the two types of actuators, based on the clustering property of the algorithm, indicates that blowing/suction actuation parameters are associated with larger tolerances when compared to optimal parameters for the belt actuators. The possibility of using a few strategically placed actuators to obtain a significant drag reduction is explored using the clustering diagnostics of this method. The optimal belt-actuator parameters obtained by optimizing the two-dimensional case is employed in three-dimensional simulations, by extending the actuators across the span of the cylinder surface. The three-dimensional controlled flow exhibits a strong two-dimensional character near the cylinder surface, resulting in significant drag reduction.
Robustness of the ATLAS pixel clustering neural network algorithm
Sidebo, Per Edvin; The ATLAS collaboration
2016-01-01
Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. The algorithms depend heavily on accurate estimation of the position of particles as they traverse the inner detector elements. An artificial neural network algorithm is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The method recovers otherwise lost tracks in dense environments where particles are separated by distances comparable to the size of the detector read-out elements. Such environments are highly relevant for LHC run 2, e.g. in searches for heavy resonances. Within the scope of run 2 track reconstruction performance and upgrades, the robustness of the neural network algorithm will be presented. The robustness has been studied by evaluating the stability of the algorithm’s performance under a range of variations in the pixel detector conditions.
Comparative Study of Clustering Algorithms in Text Mining Context
Directory of Open Access Journals (Sweden)
Abdennour Mohamed Jalil
2016-06-01
Full Text Available The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% of world population using internet [1]; have created a problem related of the processing of the data exchanged, which is rising exponentially and that should be automatically treated. This paper presents a classical process of knowledge discovery databases, in order to treat textual data. This process is divided into three parts: preprocessing, processing and post-processing. In the processing step, we present a comparative study between several clustering algorithms such as KMeans, Global KMeans, Fast Global KMeans, Two Level KMeans and FWKmeans. The comparison between these algorithms is made on real textual data from the web using RSS feeds. Experimental results identified two problems: the first one quality results which remain for algorithms, which rapidly converge. The second problem is due to the execution time that needs to decrease for some algorithms.
DYNAMIC REQUEST DISPATCHING ALGORITHM FOR WEB SERVER CLUSTER
Institute of Scientific and Technical Information of China (English)
Yang Zhenjiang; Zhang Deyun; Sun Qindong; Sun Qing
2006-01-01
Distributed architectures support increased load on popular web sites by dispatching client requests transparently among multiple servers in a cluster. Packet Single-Rewriting technology and client address hashing algorithm in ONE-IP technology which can ensure application-session-keep have been analyzed, an improved request dispatching algorithm which is simple, effective and supports dynamic load balance has been proposed. In this algorithm, dispatcher evaluates which server node will process request by applying a hash function to the client IP address and comparing the result with its assigned identifier subset; it adjusts the size of the subset according to the performance and current load of each server, so as to utilize all servers' resource effectively. Simulation shows that the improved algorithm has better performance than the original one.
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
2011-01-01
Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin clusterMaker provides a number of clustering
clusterMaker: a multi-algorithm clustering plugin for Cytoscape
Directory of Open Access Journals (Sweden)
Morris John H
2011-11-01
Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster
Exploring New Clustering Algorithms for the CMS Tracker FED
Gamboa Alvarado, Jose Leandro
2013-01-01
In the current Front End (FE) firmware clusters of hits within the APV frames are found using a simple threshold comparison (which is made between the data and a 3 or 5 sigma strip noise cut) on reordered pedestal and Common Mode (CM) noise subtracted data. In addition the CM noise subtraction requires the baseline of each APV frame to be approximately uniform. Therefore, the current algorithm will fail if the APV baseline exhibits large-scale non-uniform behavior. Under very high luminosity conditions the assumption of a uniform APV baseline breaks down and the FED is unable to maintain a high efficiency of cluster finding. \
FCM Clustering Algorithms for Segmentation of Brain MR Images
Directory of Open Access Journals (Sweden)
Yogita K. Dubey
2016-01-01
Full Text Available The study of brain disorders requires accurate tissue segmentation of magnetic resonance (MR brain images which is very important for detecting tumors, edema, and necrotic tissues. Segmentation of brain images, especially into three main tissue types: Cerebrospinal Fluid (CSF, Gray Matter (GM, and White Matter (WM, has important role in computer aided neurosurgery and diagnosis. Brain images mostly contain noise, intensity inhomogeneity, and weak boundaries. Therefore, accurate segmentation of brain images is still a challenging area of research. This paper presents a review of fuzzy c-means (FCM clustering algorithms for the segmentation of brain MR images. The review covers the detailed analysis of FCM based algorithms with intensity inhomogeneity correction and noise robustness. Different methods for the modification of standard fuzzy objective function with updating of membership and cluster centroid are also discussed.
Mapping cultivable land from satellite imagery with clustering algorithms
Arango, R. B.; Campos, A. M.; Combarro, E. F.; Canas, E. R.; Díaz, I.
2016-07-01
Open data satellite imagery provides valuable data for the planning and decision-making processes related with environmental domains. Specifically, agriculture uses remote sensing in a wide range of services, ranging from monitoring the health of the crops to forecasting the spread of crop diseases. In particular, this paper focuses on a methodology for the automatic delimitation of cultivable land by means of machine learning algorithms and satellite data. The method uses a partition clustering algorithm called Partitioning Around Medoids and considers the quality of the clusters obtained for each satellite band in order to evaluate which one better identifies cultivable land. The proposed method was tested with vineyards using as input the spectral and thermal bands of the Landsat 8 satellite. The experimental results show the great potential of this method for cultivable land monitoring from remote-sensed multispectral imagery.
Advanced defect detection algorithm using clustering in ultrasonic NDE
Gongzhang, Rui; Gachagan, Anthony
2016-02-01
A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.
Core Business Selection Based on Ant Colony Clustering Algorithm
Yu Lan; Yan Bo; Yao Baozhen
2014-01-01
Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant c...
Comparison of cluster expansion fitting algorithms for interactions at surfaces
Herder, Laura M.; Bray, Jason M.; Schneider, William F.
2015-10-01
Cluster expansions (CEs) are Ising-type interaction models that are increasingly used to model interaction and ordering phenomena at surfaces, such as the adsorbate-adsorbate interactions that control coverage-dependent adsorption or surface-vacancy interactions that control surface reconstructions. CEs are typically fit to a limited set of data derived from density functional theory (DFT) calculations. The CE fitting process involves iterative selection of DFT data points to include in a fit set and selection of interaction clusters to include in the CE. Here we compare the performance of three CE fitting algorithms-the MIT Ab-initio Phase Stability code (MAPS, the default in ATAT software), a genetic algorithm (GA), and a steepest descent (SD) algorithm-against synthetic data. The synthetic data is encoded in model Hamiltonians of varying complexity motivated by the observed behavior of atomic adsorbates on a face-centered-cubic transition metal close-packed (111) surface. We compare the performance of the leave-one-out cross-validation score against the true fitting error available from knowledge of the hidden CEs. For these systems, SD achieves lowest overall fitting and prediction error independent of the underlying system complexity. SD also most accurately predicts cluster interaction energies without ignoring or introducing extra interactions into the CE. MAPS achieves good results in fewer iterations, while the GA performs least well for these particular problems.
Optimized algorithm for balancing clusters in wireless sensor networks
Institute of Scientific and Technical Information of China (English)
Mucheol KIM; Sun-hong KIM; Hyungjin BYUN; Sang-yong HAN
2009-01-01
Wireless sensor networks consist of hundreds or thousands of sensor nodes that involve numerous restrictions including computation capability and battery capacity. Topology control is an important issue for achieving a balanced placement of sensor nodes. The clustering scheme is a widely known and efficient means of topology control for transmitting information to the base station in two hops. The automatic routing scheme of the self-organizing technique is another critical element of wireless sensor networks. In this paper we propose an optimal algorithm with cluster balance taken into consideration, and compare it with three well known and widely used approaches, I.e., LEACH, MEER, and VAP-E, in performance evaluation. Experimental results show that the proposed approach increases the overall network lifetime, indicating that the amount of energy required for communication to the base station will be reduced for locating an optimal cluster.
7 CFR 226.1 - General purpose and scope.
2010-01-01
... Agriculture Regulations of the Department of Agriculture (Continued) FOOD AND NUTRITION SERVICE, DEPARTMENT OF AGRICULTURE CHILD NUTRITION PROGRAMS CHILD AND ADULT CARE FOOD PROGRAM General § 226.1 General purpose and... Child and Adult Care Food Program. Section 17 of the National School Lunch Act, as amended,...
General-purpose isiZulu speech synthesiser
CSIR Research Space (South Africa)
Louw, A
2005-07-01
Full Text Available A general-purpose isiZulu text-to-speech (TTS) system was developed, based on the “Multisyn” unit-selection approach supported by the Festival TTS toolkit. The development involved a number of challenges related to the interface between speech...
A cluster analysis on road traffic accidents using genetic algorithms
Saharan, Sabariah; Baragona, Roberto
2017-04-01
The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.
Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion
Directory of Open Access Journals (Sweden)
Jin Qi
2015-01-01
Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.
Clustering Algorithms: Their Application to Gene Expression Data
Oyelade, Jelili; Isewon, Itunuoluwa; Oladipupo, Funke; Aromolaran, Olufemi; Uwoghiren, Efosa; Ameh, Faridah; Achas, Moses; Adebiyi, Ezekiel
2016-01-01
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure. PMID:27932867
Identifying multiple influential spreaders by a heuristic clustering algorithm
Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng
2017-03-01
The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very "negligible". Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant.
General-Purpose Serial Interface For Remote Control
Busquets, Anthony M.; Gupton, Lawrence E.
1990-01-01
Computer controls remote television camera. General-purpose controller developed to serve as interface between host computer and pan/tilt/zoom/focus functions on series of automated video cameras. Interface port based on 8251 programmable communications-interface circuit configured for tristated outputs, and connects controller system to any host computer with RS-232 input/output (I/O) port. Accepts byte-coded data from host, compares them with prestored codes in read-only memory (ROM), and closes or opens appropriate switches. Six output ports control opening and closing of as many as 48 switches. Operator controls remote television camera by speaking commands, in system including general-purpose controller.
Gravitation field algorithm and its application in gene cluster
Directory of Open Access Journals (Sweden)
Zheng Ming
2010-09-01
Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.
Local rewiring algorithms to increase clustering and grow a small world
Alstott, Jeff; Pizza, Pamela B; Radcliffe, Mary
2016-01-01
Many real-world networks have high clustering among vertices: vertices that share neighbors are often also directly connected to each other. A network's clustering can be a useful indicator of its connectedness and community structure. Algorithms for generating networks with high clustering have been developed, but typically rely on adding or removing edges and nodes, sometimes from a completely empty network. Here, we introduce algorithms that create a highly clustered network by starting with an existing network and rearranging edges, without adding or removing them; these algorithms can preserve other network properties even as the clustering increases. These algorithms rely on local rewiring rules, in which a single edge changes one of its vertices in a way that is guaranteed to increase clustering. This greedy algorithm can be applied iteratively to transform a random network into a form with much higher clustering. Additionally, these algorithms grow the network's clustering faster than they increase it...
Sweeney, Timothy E; Chen, Albert C; Gevaert, Olivier
2015-11-19
In order to discover new subsets (clusters) of a data set, researchers often use algorithms that perform unsupervised clustering, namely, the algorithmic separation of a dataset into some number of distinct clusters. Deciding whether a particular separation (or number of clusters, K) is correct is a sort of 'dark art', with multiple techniques available for assessing the validity of unsupervised clustering algorithms. Here, we present a new technique for unsupervised clustering that uses multiple clustering algorithms, multiple validity metrics, and progressively bigger subsets of the data to produce an intuitive 3D map of cluster stability that can help determine the optimal number of clusters in a data set, a technique we call COmbined Mapping of Multiple clUsteriNg ALgorithms (COMMUNAL). COMMUNAL locally optimizes algorithms and validity measures for the data being used. We show its application to simulated data with a known K, and then apply this technique to several well-known cancer gene expression datasets, showing that COMMUNAL provides new insights into clustering behavior and stability in all tested cases. COMMUNAL is shown to be a useful tool for determining K in complex biological datasets, and is freely available as a package for R.
General Purpose Multimedia Dataset - GarageBand 2008
DEFF Research Database (Denmark)
Meng, Anders
This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....
A Flow-Partitioned Unequal Clustering Routing Algorithm for Wireless Sensor Networks
Jian Peng; Xiaohai Chen; Tang Liu
2014-01-01
Energy efficiency and energy balance are two important issues for wireless sensor networks. In previous clustering routing algorithms, multihop transmission, sleep scheduling, and unequal clustering are always used to improve energy efficiency and energy balance. In these algorithms, only the cluster heads share the burden of data forwarding in each round. In this paper, we propose a flow-partitioned unequal clustering routing (FPUC) algorithm to achieve better energy efficiency and energy ba...
Development of Automatic Cluster Algorithm for Microcalcification in Digital Mammography
Energy Technology Data Exchange (ETDEWEB)
Choi, Seok Yoon [Dept. of Medical Engineering, Korea University, Seoul (Korea, Republic of); Kim, Chang Soo [Dept. of Radiological Science, College of Health Sciences, Catholic University of Pusan, Pusan (Korea, Republic of)
2009-03-15
Digital Mammography is an efficient imaging technique for the detection and diagnosis of breast pathological disorders. Six mammographic criteria such as number of cluster, number, size, extent and morphologic shape of microcalcification, and presence of mass, were reviewed and correlation with pathologic diagnosis were evaluated. It is very important to find breast cancer early when treatment can reduce deaths from breast cancer and breast incision. In screening breast cancer, mammography is typically used to view the internal organization. Clusterig microcalcifications on mammography represent an important feature of breast mass, especially that of intraductal carcinoma. Because microcalcification has high correlation with breast cancer, a cluster of a microcalcification can be very helpful for the clinical doctor to predict breast cancer. For this study, three steps of quantitative evaluation are proposed : DoG filter, adaptive thresholding, Expectation maximization. Through the proposed algorithm, each cluster in the distribution of microcalcification was able to measure the number calcification and length of cluster also can be used to automatically diagnose breast cancer as indicators of the primary diagnosis.
Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm
Directory of Open Access Journals (Sweden)
S.Padmaja
2016-02-01
Full Text Available The proposed work does an improved K-means clustering algorithm for identifying internet user behaviour. Web data analysis includes the transformation and interpretation of web log data find out the information, patterns and knowledge discovery. The efficiency of the algorithm is analyzed by considering certain parameters. The parameters are date, time, S_id, CS_method, C_IP, User_agent and time taken. The research done by using more than 2 years of real data set collected from two different group of institutions web server .this dataset provides a better analysis of Log data to identify internet user behaviour.
Clustering Algorithms for Heterogeneous Wireless Sensor Networks - A Brief Survey
Directory of Open Access Journals (Sweden)
A.MeenaKowshalya
2011-09-01
Full Text Available Wireless sensor networks (WSN are emerging in vari ous fields like disaster management, battle field surveillance and border security surveillance. A la rge number of sensors in these applications are unattended and work autonomously. Clustering is a k ey technique to improve the network lifetime, reduc e the energy consumption and increase the scalability of the sensor network. In this paper, we study the impact of heterogeneity of the nodes to the perform ance of WSN. This paper surveys the different clust ering algorithm for heterogeneous WSN .
Classification of posture maintenance data with fuzzy clustering algorithms
Bezdek, James C.
1992-01-01
Sensory inputs from the visual, vestibular, and proprioreceptive systems are integrated by the central nervous system to maintain postural equilibrium. Sustained exposure to microgravity causes neurosensory adaptation during spaceflight, which results in decreased postural stability until readaptation occurs upon return to the terrestrial environment. Data which simulate sensory inputs under various sensory organization test (SOT) conditions were collected in conjunction with Johnson Space Center postural control studies using a tilt-translation device (TTD). The University of West Florida applied the fuzzy c-meams (FCM) clustering algorithms to this data with a view towards identifying various states and stages of subjects experiencing such changes. Feature analysis, time step analysis, pooling data, response of the subjects, and the algorithms used are discussed.
Cluster-Based Distributed Algorithms for Very Large Linear Equations
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
In many applications such as computational fluid dynamics and weather prediction, as well as image processing and state of Markov chain etc., the grade of matrix n is often very large, and any serial algorithm cannot solve the problems. A distributed cluster-based solution for very large linear equations is discussed, it includes the definitions of notations, partition of matrix, communication mechanism, and a master-slaver algorithm etc., the computing cost is O(n3/N), the memory cost is O(n2/N), the I/O cost is O(n2/N), and the communication cost is O(Nn), here, N is the number of computing nodes or processes. Some tests show that the solution could solve the double type of matrix under 106×106 effectively.
Dynamic and static properties of the invaded cluster algorithm
Moriarty, K.; Machta, J.; Chayes, L. Y.
1999-02-01
Simulations of the two-dimensional Ising and three-state Potts models at their critical points are performed using the invaded cluster (IC) algorithm. It is argued that observables measured on a sublattice of size l should exhibit a crossover to Swendsen-Wang (SW) behavior for l sufficiently less than the lattice size L, and a scaling form is proposed to describe the crossover phenomenon. It is found that the energy autocorrelation time τɛ(l,L) for an l×l sublattice attains a maximum in the crossover region, and a dynamic exponent zIC for the IC algorithm is defined according to τɛ,max~LzIC. Simulation results for the three-state model yield zIC=0.346+/-0.002, which is smaller than values of the dynamic exponent found for the SW and Wolff algorithms and also less than the Li-Sokal bound. The results are less conclusive for the Ising model, but it appears that zICWolff algorithms.
A Novel Dynamic Clustering Algorithm Based on Immune Network and Tabu Search
Institute of Scientific and Technical Information of China (English)
ZHONGJiang; WUZhongfu; WUKaigui; YANGQiang
2005-01-01
It's difficult to indicate the rational number of partitions in the data set before clustering usually.The problem can't be solved by traditional clustering algorithm, such as k-means or its variations. This paper proposes a novel Dynamic clustering algorithm based on the artificial immune network and tabu search (DCBIT). It optimizes the number and the location of the clusters at the same time. The algorithm includes two phases, it begins by running immune network algorithm to find a Clustering feasible solution (CFS), then it employs tabu search to get the optimum cluster number and cluster centers on the CFS. Also, the probabilities acquiring the CFS through immune network algorithm have been discussed in this paper. Some experimental results show that new algorithm has satisfied convergent probability and convergent speed.
Image Transformation using Modified Kmeans clustering algorithm for Parallel saliency map
Directory of Open Access Journals (Sweden)
Aman Sharma
2013-08-01
Full Text Available to design an image transformation system is Depending on the transform chosen, the input and output images may appear entirely different and have different interpretations. Image Transformationwith the help of certain module like input image, image cluster index, object in cluster and color index transformation of image. K-means clustering algorithm is used to cluster the image for bettersegmentation. In the proposed method parallel saliency algorithm with K-means clustering is used to avoid local minima and to find the saliency map. The region behind that of using parallel saliency algorithm is proved to be more than exiting saliency algorithm.
A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.
Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang
2016-12-01
This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.
Directory of Open Access Journals (Sweden)
Mingwei Leng
2013-01-01
Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
Clustering Algorithm Based on Crowding Niche%小生境排挤聚类算法
Institute of Scientific and Technical Information of China (English)
业宁; 董逸生
2003-01-01
A new clustering algorithm is proposed in this paper, which is based on crowding niche. Homogeneityspontaneous to withstands heterogeneity when organisms are evolving. Contemporary, Individual in same class com-pete each other to strive for limited resource. Individual that has bad fitness will be eliminated. We propose a cluster-ing algorithm based on this idea. Experiment evaluation has proved its efficiency.
A Heuristic Task Scheduling Algorithm for Heterogeneous Virtual Clusters
Directory of Open Access Journals (Sweden)
Weiwei Lin
2016-01-01
Full Text Available Cloud computing provides on-demand computing and storage services with high performance and high scalability. However, the rising energy consumption of cloud data centers has become a prominent problem. In this paper, we first introduce an energy-aware framework for task scheduling in virtual clusters. The framework consists of a task resource requirements prediction module, an energy estimate module, and a scheduler with a task buffer. Secondly, based on this framework, we propose a virtual machine power efficiency-aware greedy scheduling algorithm (VPEGS. As a heuristic algorithm, VPEGS estimates task energy by considering factors including task resource demands, VM power efficiency, and server workload before scheduling tasks in a greedy manner. We simulated a heterogeneous VM cluster and conducted experiment to evaluate the effectiveness of VPEGS. Simulation results show that VPEGS effectively reduced total energy consumption by more than 20% without producing large scheduling overheads. With the similar heuristic ideology, it outperformed Min-Min and RASA with respect to energy saving by about 29% and 28%, respectively.
Ternary alloy material prediction using genetic algorithm and cluster expansion
Energy Technology Data Exchange (ETDEWEB)
Chen, Chong [Iowa State Univ., Ames, IA (United States)
2015-12-01
This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe_{3}VSi_{2} is a new stable phase and it can be very inspiring to the future experiments.
Thermodynamic Casimir effect in films: the exchange cluster algorithm.
Hasenbusch, Martin
2015-02-01
We study the thermodynamic Casimir force for films with various types of boundary conditions and the bulk universality class of the three-dimensional Ising model. To this end, we perform Monte Carlo simulations of the improved Blume-Capel model on the simple cubic lattice. In particular, we employ the exchange or geometric cluster cluster algorithm [Heringa and Blöte, Phys. Rev. E 57, 4976 (1998)]. In a previous work, we demonstrated that this algorithm allows us to compute the thermodynamic Casimir force for the plate-sphere geometry efficiently. It turns out that also for the film geometry a substantial reduction of the statistical error can achieved. Concerning physics, we focus on (O,O) boundary conditions, where O denotes the ordinary surface transition. These are implemented by free boundary conditions on both sides of the film. Films with such boundary conditions undergo a phase transition in the universality class of the two-dimensional Ising model. We determine the inverse transition temperature for a large range of thicknesses L(0) of the film and study the scaling of this temperature with L(0). In the neighborhood of the transition, the thermodynamic Casimir force is affected by finite size effects, where finite size refers to a finite transversal extension L of the film. We demonstrate that these finite size effects can be computed by using the universal finite size scaling function of the free energy of the two-dimensional Ising model.
jClustering, an Open Framework for the Development of 4D Clustering Algorithms
Mateos-Pérez, José María; García-Villalba, Carmen; Pascau, Javier; Desco, Manuel; Vaquero, Juan J.
2013-01-01
We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License) to allow modification if necessary. PMID:23990913
jClustering, an open framework for the development of 4D clustering algorithms.
Directory of Open Access Journals (Sweden)
José María Mateos-Pérez
Full Text Available We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License to allow modification if necessary.
How General-Purpose can a GPU be?
Directory of Open Access Journals (Sweden)
Philip Machanick
2015-12-01
Full Text Available The use of graphics processing units (GPUs in general-purpose computation (GPGPU is a growing field. GPU instruction sets, while implementing a graphics pipeline, draw from a range of single instruction multiple datastream (SIMD architectures characteristic of the heyday of supercomputers. Yet only one of these SIMD instruction sets has been of application on a wide enough range of problems to survive the era when the full range of supercomputer design variants was being explored: vector instructions. This paper proposes a reconceptualization of the GPU as a multicore design with minimal exotic modes of parallelism so as to make GPGPU truly general.
A general-purpose optimization program for engineering design
Vanderplaats, G. N.; Sugimoto, H.
1986-01-01
A new general-purpose optimization program for engineering design is described. ADS (Automated Design Synthesis) is a FORTRAN program for nonlinear constrained (or unconstrained) function minimization. The optimization process is segmented into three levels: Strategy, Optimizer, and One-dimensional search. At each level, several options are available so that a total of nearly 100 possible combinations can be created. An example of available combinations is the Augmented Lagrange Multiplier method, using the BFGS variable metric unconstrained minimization together with polynomial interpolation for the one-dimensional search.
Maximum-entropy clustering algorithm and its global convergence analysis
Institute of Scientific and Technical Information of China (English)
ZHANG; Zhihua
2001-01-01
［1］Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithm. New York: Plenum, 1981.［2］Krishnapuram, R., Keller, J., A possibilistic approach to clustering, IEEE Trans. on Fuzzy Systems, 1993, 1(2): 98.［3］Yair, E., Zeger, K., Gersho, A., Competitive learning and soft competition for vector quantizer design, IEEE Trans on Signal Processing, 1992, 40(2): 294.［4］Pal, N. R., Bezdek, J. C., Tsao, E. C. K., Generalized clustering networks and Kohonen's self-organizing scheme, IEEE Trans on Neural Networks, 1993, 4(4): 549.［5］Karayiannis, N. B., Bezdek, J. C., Pal, N. R. et al., Repair to GLVQ: a new family of competitive learning schemes, IEEE Trans on Neural Networks, 1996, 7(5): 1062.［6］Karayiannis, N. B., Pai, P. I., Fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1996, 7(5): 1196.［7］Karayiannis, N. B., A methodology for constructing fuzzy algorithms for learning vector quantization, IEEE Trans. on Neural Networks, 1997, 8(3): 505.［8］Karayiannis, N. B., Bezdek, J. C., An integrated approach to fuzzy learning vector quantization and fuzzy C-Means clustering, IEEE Trans. on Fuzzy Systems, 1997, 5(4): 622.［9］Li Xing-si, An efficient approach to nonlinear minimax problems, Chinese Science Bulletin? 1992, 37(10): 802.［10］Li Xing-si, An efficient approach to a class of non-smooth optimization problems, Science in China, Series A,1994, 37(3): 323.［11］. Zangwill, W., Non-linear Programming: A Unified Approach, Englewood Cliffs: Prentice-Hall, 1969.［12］. Fletcher, R., Practical Methods of Optimization,2nd ed., New York: John Wiley & Sons, 1987.［13］. Zhang Zhihua, Zheng Nanning, Wang Tianshu, Behavioral analysis and improving of generalized LVQ neural network, Acta Automatica Sinica, 1999, 25(5): 582.［14］. Kirkpatrick, S., Gelatt, C. D., Vecchi, M. P., Optimization by simulated annealing, Science, 1983, 220(3): 671.［15］. Ross, K., Deterministic annealing for
A Request Distribution Algorithm for Web Server Cluster
Directory of Open Access Journals (Sweden)
Wei Zhang
2011-12-01
Full Text Available With the explosively increasing of web-based applications’ workloads, Web server cluster encounters challenge in response time for requests. Request distribution among servers in web server cluster is the key to address such challenge, especially under heavy workloads. In this paper, we propose a new request distribution algorithm named llac (least load active cache for load balancing switch in web server cluster. The goal of llac is to improve the cache hit rate and reduce response time. Packets are parsed in IP level, and back-end servers are notified to cache hot files using link change technology, neither changing URL information nor modifying the service program. This avoids switching overhead between user mode and kernel mode. The load balancing switch directly creates connection with the selected server, avoiding migrating connection overhead. This policy estimates the current composited load of each server and selects the server with the least load to serve the request. It also improves the resource utilization of web servers. Experimental results show that llac achieves better performance for web applications than wrr (weight round robin which is a popular request distribution.
Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen
2017-03-01
The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.
Textural defect detect using a revised ant colony clustering algorithm
Zou, Chao; Xiao, Li; Wang, Bingwen
2007-11-01
We propose a totally novel method based on a revised ant colony clustering algorithm (ACCA) to explore the topic of textural defect detection. In this algorithm, our efforts are mainly made on the definition of local irregularity measurement and the implementation of the revised ACCA. The local irregular measurement defined evaluates the local textural inconsistency of each pixel against their mini-environment. In our revised ACCA, the behaviors of each ant are divided into two steps: release pheromone and act. The quantity of pheromone released is proportional to the irregularity measurement; the actions of the ants to act next are chosen independently of each other in a stochastic way according to some evaluated heuristic knowledge. The independency of ants implies the inherent parallel computation architecture of this algorithm. We apply the proposed method in some typical textural images with defects. From the series of pheromone distribution map (PDM), it can be clearly seen that the pheromone distribution approaches the textual defects gradually. By some post-processing, the final distribution of pheromone can demonstrate the shape and area of the defects well.
Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section
Institute of Scientific and Technical Information of China (English)
YU Yongqian; ZHAO Xiangguo; CHEN Hengyue; WANG Bin; YU Ge; WANG Guoren
2006-01-01
This paper presents an effective clustering mode and a novel clustering result evaluating mode. Clustering mode has two limited integral parameters. Evaluating mode evaluates clustering results and gives each a mark. The higher mark the clustering result gains, the higher quality it has. By organizing two modes in different ways, we can build two clustering algorithms: SECDU(Self-Expanded Clustering Algorithm based on Density Units) and SECDUF(Self-Expanded Clustering Algorithm Based on Density Units with Evaluation Feedback Section). SECDU enumerates all value pairs of two parameters of clustering mode to process data set repeatedly and evaluates every clustering result by evaluating mode. Then SECDU output the clustering result that has the highest evaluating mark among all the ones. By applying "hill-climbing algorithm", SECDUF improves clustering efficiency greatly. Data sets that have different distribution features can be well adapted to both algorithms. SECDU and SECDUF can output high-quality clustering results. SECDUF tunes parameters of clustering mode automatically and no man's action involves through the whole process. In addition, SECDUF has a high clustering performance.
An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering
Institute of Scientific and Technical Information of China (English)
Taher NIKNAM; Babak AMIRI; Javad OLAMAEI; Ali AREFI
2009-01-01
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the Kmeans algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Riplcy's Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.
An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets
Directory of Open Access Journals (Sweden)
Kang Zhang
2014-01-01
Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.
Directory of Open Access Journals (Sweden)
G. Abel Thangaraja
2014-11-01
Full Text Available The need of Data mining is because of the explosive growth of data from terabytes to petabytes. Data mining preprocess aims to produce the quality mining result in descriptive and predictive analysis. The quality of a clustering result depends on both the similarity measure used by the method and its implementation. A straightforward way to combine structural and attribute similarities is to use a weighted distance function. Clustering results are arrived based on attribute similarities. The clusters balance the attribute and structural similarities. The existing Structural and Attribute cluster algorithm is analyzed and a new algorithm is proposed. Both the algorithms are compared and results are analyzed. It is found that the modified algorithm gives better quality clusters.
Combined Density-based and Constraint-based Algorithm for Clustering
Institute of Scientific and Technical Information of China (English)
CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing
2006-01-01
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.
Robust K-Median and K-Means Clustering Algorithms for Incomplete Data
Directory of Open Access Journals (Sweden)
Jinhua Li
2016-01-01
Full Text Available Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.
Directory of Open Access Journals (Sweden)
Guohua Zou
2016-12-01
Full Text Available New medical imaging technology, such as Computed Tomography and Magnetic Resonance Imaging (MRI, has been widely used in all aspects of medical diagnosis. The purpose of these imaging techniques is to obtain various qualitative and quantitative data of the patient comprehensively and accurately, and provide correct digital information for diagnosis, treatment planning and evaluation after surgery. MR has a good imaging diagnostic advantage for brain diseases. However, as the requirements of the brain image definition and quantitative analysis are always increasing, it is necessary to have better segmentation of MR brain images. The FCM (Fuzzy C-means algorithm is widely applied in image segmentation, but it has some shortcomings, such as long computation time and poor anti-noise capability. In this paper, firstly, the Ant Colony algorithm is used to determine the cluster centers and the number of FCM algorithm so as to improve its running speed. Then an improved Markov random field model is used to improve the algorithm, so that its antinoise ability can be improved. Experimental results show that the algorithm put forward in this paper has obvious advantages in image segmentation speed and segmentation effect.
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
Energy Efficient Backoff Hierarchical Clustering Algorithms for Multi-Hop Wireless Sensor Networks
Institute of Scientific and Technical Information of China (English)
Jun Wang; Yong-Tao Cao; Jun-Yuan Xie; Shi-Fu Chen
2011-01-01
Compared with flat routing protocols, clustering is a fundamental performance improvement technique in wireless sensor networks, which can increase network scalability and lifetime. In this paper, we integrate the multi-hop technique with a backoff-based clustering algorithm to organize sensors. By using an adaptive backoff strategy, the algorithm not only realizes load balance among sensor node, but also achieves fairly uniform cluster head distribution across the network. Simulation results also demonstrate our algorithm is more energy-efficient than classical ones. Our algorithm is also easily extended to generate a hierarchy of cluster heads to obtain better network management and energy-efficiency.
Geographical parthenogenesis: General purpose genotypes and frozen niche variation
DEFF Research Database (Denmark)
Vrijenhoek, Robert C.; Parker, Dave
2009-01-01
marginal environments to escape competition with their sexual relatives. These ideas often fail to consider the early competitive interactions with immediate sexual ancestors, which shape alternative paths that newly formed clonal lineages might follow. Here we review the history and evidence for two...... hypotheses concerning the evolution of niche breadth in asexual species - the "general-purpose genotype" (GPG) and "frozen niche-variation" (FNV) models. The two models are often portrayed as mutually exclusive, respectively viewing clonal lineages as generalists versus specialists. Nonetheless......, they are complex syllogisms that share common assumptions regarding the likely origins of clonal diversity and the strength of interclonal selection in shaping the ecological breadth of asexual populations. Both models find support in ecological and phylogeographic studies of a wide range of organisms...
A Chemical Containment Model for the General Purpose Work Station
Flippen, Alexis A.; Schmidt, Gregory K.
1994-01-01
Contamination control is a critical safety requirement imposed on experiments flying on board the Spacelab. The General Purpose Work Station, a Spacelab support facility used for life sciences space flight experiments, is designed to remove volatile compounds from its internal airpath and thereby minimize contamination of the Spacelab. This is accomplished through the use of a large, multi-stage filter known as the Trace Contaminant Control System. Many experiments planned for the Spacelab require the use of toxic, volatile fixatives in order to preserve specimens prior to postflight analysis. The NASA-Ames Research Center SLS-2 payload, in particular, necessitated the use of several toxic, volatile compounds in order to accomplish the many inflight experiment objectives of this mission. A model was developed based on earlier theories and calculations which provides conservative predictions of the resultant concentrations of these compounds given various spill scenarios. This paper describes the development and application of this model.
General purpose multiplexing device for cryogenic microwave systems
Chapman, Benjamin J.; Moores, Bradley A.; Rosenthal, Eric I.; Kerckhoff, Joseph; Lehnert, K. W.
2016-05-01
We introduce and experimentally characterize a general purpose device for signal processing in circuit quantum electrodynamics systems. The device is a broadband two-port microwave circuit element with three modes of operation: it can transmit, reflect, or invert incident signals between 4 and 8 GHz. This property makes it a versatile tool for lossless signal processing at cryogenic temperatures. In particular, rapid switching (≤ 15 ns ) between these operation modes enables several multiplexing readout protocols for superconducting qubits. We report the device's performance in a two-channel code domain multiplexing demonstration. The multiplexed data are recovered with fast readout times (up to 400 ns ) and infidelities ≤ 10-2 for probe powers ≥ 7 fW , in agreement with the expectation for binary signaling with Gaussian noise.
General-purpose fuzzy controller for dc-dc converters
Energy Technology Data Exchange (ETDEWEB)
Mattavelli, P.; Rossetto, L.; Spiazzi, G.; Tenti, P. [Univ. of Padova (Italy)
1997-01-01
In this paper, a general-purpose fuzzy controller for dc-dc converters is investigated. Based on a qualitative description of the system to be controlled, fuzzy controllers are capable of good performances, even for those systems where linear control techniques fail, e.g., when a mathematical description is not available or is in the presence of wide parameter variations. The presented approach is general and can be applied to any dc-dc converter topologies. Controller implementation is relatively simple and can guarantee a small-signal response as fast and stable as other standard regulators and an improved large-signal response. Simulation results of Buck-Boost and Sepic converters show control potentialities.
Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification
Directory of Open Access Journals (Sweden)
Loris Nanni
2015-01-01
Full Text Available We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets. The aim is to find General-Purpose (GP heterogeneous ensembles (requiring little to no parameter tuning that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM, even when both kernel selection and SVM parameters are carefully tuned for each dataset.
Toward a General-Purpose Heterogeneous Ensemble for Pattern Classification.
Nanni, Loris; Brahnam, Sheryl; Ghidoni, Stefano; Lumini, Alessandra
2015-01-01
We perform an extensive study of the performance of different classification approaches on twenty-five datasets (fourteen image datasets and eleven UCI data mining datasets). The aim is to find General-Purpose (GP) heterogeneous ensembles (requiring little to no parameter tuning) that perform competitively across multiple datasets. The state-of-the-art classifiers examined in this study include the support vector machine, Gaussian process classifiers, random subspace of adaboost, random subspace of rotation boosting, and deep learning classifiers. We demonstrate that a heterogeneous ensemble based on the simple fusion by sum rule of different classifiers performs consistently well across all twenty-five datasets. The most important result of our investigation is demonstrating that some very recent approaches, including the heterogeneous ensemble we propose in this paper, are capable of outperforming an SVM classifier (implemented with LibSVM), even when both kernel selection and SVM parameters are carefully tuned for each dataset.
Using a cognitive architecture for general purpose service robot control
Puigbo, Jordi-Ysard; Pumarola, Albert; Angulo, Cecilio; Tellez, Ricardo
2015-04-01
A humanoid service robot equipped with a set of simple action skills including navigating, grasping, recognising objects or people, among others, is considered in this paper. By using those skills the robot should complete a voice command expressed in natural language encoding a complex task (defined as the concatenation of a number of those basic skills). As a main feature, no traditional planner has been used to decide skills to be activated, as well as in which sequence. Instead, the SOAR cognitive architecture acts as the reasoner by selecting which action the robot should complete, addressing it towards the goal. Our proposal allows to include new goals for the robot just by adding new skills (without the need to encode new plans). The proposed architecture has been tested on a human-sized humanoid robot, REEM, acting as a general purpose service robot.
Extension of K-Means Algorithm for clustering mixed data | Onuodu ...
African Journals Online (AJOL)
Extension of K-Means Algorithm for clustering mixed data. ... PROMOTING ACCESS TO AFRICAN RESEARCH ... In this work, a new hybrid method has been proposed which extends K-means algorithm to categorical domain and mixed-type ...
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm
DEFF Research Database (Denmark)
Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte
2006-01-01
Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset....... The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering...
Vinitsky, Sergue; Chuluunbaatar, Ochbadrakh; Rostovtsev, Vitaly; Hai, Luong Le; Derbov, Vladimir; Krassovitskiy, Pavel
2013-01-01
A model for quantum tunnelling of a cluster comprising A identical particles, coupled by oscillator-type potential, through short-range repulsive potential barriers is introduced for the first time in the new symmetrized-coordinate representation and studied within the s-wave approximation. The symbolic-numerical algorithms for calculating the effective potentials of the close-coupling equations in terms of the cluster wave functions and the energy of the barrier quasistationary states are formulated and implemented using the Maple computer algebra system. The effect of quantum transparency, manifesting itself in nonmonotonic resonance-type dependence of the transmission coefficient upon the energy of the particles, the number of the particles A=2,3,4, and their symmetry type, is analyzed. It is shown that the resonance behavior of the total transmission coefficient is due to the existence of barrier quasistationary states imbedded in the continuum.
MST-BASED CLUSTERING TOPOLOGY CONTROL ALGORITHM FOR WIRELESS SENSOR NETWORKS
Institute of Scientific and Technical Information of China (English)
Cai Wenyu; Zhang Meiyan
2010-01-01
In this paper,we propose a novel clustering topology control algorithm named Minimum Spanning Tree (MST)-based Clustering Topology Control (MCTC) for Wireless Sensor Networks (WSNs),which uses a hybrid approach to adjust sensor nodes' transmission power in two-tiered hierarchical WSNs. MCTC algorithm employs a one-hop Maximum Energy & Minimum Distance (MEMD) clustering algorithm to decide clustering status. Each cluster exchanges information between its own Cluster Members (CMs) locally and then deliveries information to the Cluster Head (CH). Moreover,CHs exchange information between CH and CH and afterwards transmits aggregated information to the base station finally. The intra-cluster topology control scheme uses MST to decide CMs' transmission radius,similarly,the inter-cluster topology control scheme applies MST to decide CHs' transmission radius. Since the intra-cluster topology control is a full distributed approach and the inter-cluster topology control is a pure centralized approach performed by the base station,therefore,MCTC algorithm belongs to one kind of hybrid clustering topology control algorithms and can obtain scalability topology and strong connectivity guarantees simultaneously. As a result,the network topology will be reduced by MCTC algorithm so that network energy efficiency will be improved. The simulation results verify that MCTC outperforms traditional topology control schemes such as LMST,DRNG and MEMD at the aspects of average node's degree,average node's power radius and network lifetime,respectively.
Identifying prototypical components in behaviour using clustering algorithms.
Directory of Open Access Journals (Sweden)
Elke Braun
Full Text Available Quantitative analysis of animal behaviour is a requirement to understand the task solving strategies of animals and the underlying control mechanisms. The identification of repeatedly occurring behavioural components is thereby a key element of a structured quantitative description. However, the complexity of most behaviours makes the identification of such behavioural components a challenging problem. We propose an automatic and objective approach for determining and evaluating prototypical behavioural components. Behavioural prototypes are identified using clustering algorithms and finally evaluated with respect to their ability to represent the whole behavioural data set. The prototypes allow for a meaningful segmentation of behavioural sequences. We applied our clustering approach to identify prototypical movements of the head of blowflies during cruising flight. The results confirm the previously established saccadic gaze strategy by the set of prototypes being divided into either predominantly translational or rotational movements, respectively. The prototypes reveal additional details about the saccadic and intersaccadic flight sections that could not be unravelled so far. Successful application of the proposed approach to behavioural data shows its ability to automatically identify prototypical behavioural components within a large and noisy database and to evaluate these with respect to their quality and stability. Hence, this approach might be applied to a broad range of behavioural and neural data obtained from different animals and in different contexts.
Asztalos, Stephen J.; Hennig, Wolfgang; Warburton, William K.
2016-01-01
Pulse shape discrimination applied to certain fast scintillators is usually performed offline. In sufficiently high-event rate environments data transfer and storage become problematic, which suggests a different analysis approach. In response, we have implemented a general purpose pulse shape analysis algorithm in the XIA Pixie-500 and Pixie-500 Express digital spectrometers. In this implementation waveforms are processed in real time, reducing the pulse characteristics to a few pulse shape analysis parameters and eliminating time-consuming waveform transfer and storage. We discuss implementation of these features, their advantages, necessary trade-offs and performance. Measurements from bench top and experimental setups using fast scintillators and XIA processors are presented.
Parallel Genetic Algorithms with Dynamic Topology using Cluster Computing
Directory of Open Access Journals (Sweden)
ADAR, N.
2016-08-01
Full Text Available A parallel genetic algorithm (PGA conducts a distributed meta-heuristic search by employing genetic algorithms on more than one subpopulation simultaneously. PGAs migrate a number of individuals between subpopulations over generations. The layout that facilitates the interactions of the subpopulations is called the topology. Static migration topologies have been widely incorporated into PGAs. In this article, a PGA with a dynamic migration topology (D-PGA is proposed. D-PGA generates a new migration topology in every epoch based on the average fitness values of the subpopulations. The D-PGA has been tested against ring and fully connected migration topologies in a Beowulf Cluster. The D-PGA has outperformed the ring migration topology with comparable communication cost and has provided competitive or better results than a fully connected migration topology with significantly lower communication cost. PGA convergence behaviors have been analyzed in terms of the diversities within and between subpopulations. Conventional diversity can be considered as the diversity within a subpopulation. A new concept of permeability has been introduced to measure the diversity between subpopulations. It is shown that the success of the proposed D-PGA can be attributed to maintaining a high level of permeability while preserving diversity within subpopulations.
A Heuristic Clustering Algorithm for Mining Communities in Signed Networks
Institute of Scientific and Technical Information of China (English)
Bo Yang; Da-You Liu
2007-01-01
Signed network is an important kind of complex network, which includes both positive relations and negative relations. Communities of a signed network are defined as the groups of vertices, within which positive relations are dense and between which negative relations are also dense. Being able to identify communities of signed networks is helpful for analysis of such networks. Hitherto many algorithms for detecting network communities have been developed. However, most of them are designed exclusively for the networks including only positive relations and are not suitable for signed networks.So the problem of mining communities of signed networks quickly and correctly has not been solved satisfactorily. In this paper, we propose a heuristic algorithm to address this issue. Compared with major existing methods, our approach has three distinct features. First, it is very fast with a roughly linear time with respect to network size. Second, it exhibits a good clustering capability and especially can work well with complex networks without well-defined community structures.Finally, it is insensitive to its built-in parameters and requires no prior knowledge.
IMPROVING THE CLUSTER PERFORMANCE BY COMBINING PSO AND K-MEANS ALGORITHM
Directory of Open Access Journals (Sweden)
G. Komarasamy
2011-04-01
Full Text Available Clustering is a technique that can divide data objects into groups based on information found in the data that describes the objects and their relationships. In this paper describe to improving the clustering performance by combine Particle Swarm Optimization (PSO and K-means algorithm. The PSO algorithm successfully converges during the initial stages of a global search, but around global optimum, the search process will become very slow. On the contrary, K-means algorithm can achieve faster convergence to optimum solution. Unlike K-means method, new algorithm does not require a specific number of clusters given before performing the clustering process and it is able to find the local optimal number of clusters during the clustering process. In each iteration process, the inertia weight was changed based on the current iteration and best fitness. The experimental result shows that better performance of new algorithm by using different data sets.
A new-style clustering algorithm based on swarm intelligent theory
Institute of Scientific and Technical Information of China (English)
CHEN Zhuo; LIU Xiang-shuang
2007-01-01
Traditional clustering algorithms generally have some problems, such as the sensitivity to initializing parameter, difficulty in finding out the optimization clustering result and the validity of clustering. In this paper, a FSM and a mathematic model of a new-style clustering algorithm based on the swarm intelligence are provided. In this algorithm, the clustering main body moves in a three-dimensional space and has the abilities of memory, communication, analysis, judgment and coordinating information. Experimental results conform that this algorithm has many merits such as insensitive to the order of the data, capable of dealing with exceptional,high-dimension or complicated data. The algorithm can be used in the fields of Web mining, incremental clustering, economic analysis, pattern recognition, document classification and so on.
Design of the SLAC RCE Platform: A General Purpose ATCA Based Data Acquisition System
Energy Technology Data Exchange (ETDEWEB)
Herbst, R. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Claus, R. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Freytag, M. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Haller, G. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Huffer, M. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Maldonado, S. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Nishimura, K. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; O' Grady, C. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Panetta, J. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Perazzo, A. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Reese, B. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Ruckman, L. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Thayer, J. G. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.; Weaver, M. [SLAC National Accelerator Laboratory, Menlo Park, CA (United States). Research Engineering Div.
2015-01-23
The SLAC RCE platform is a general purpose clustered data acquisition system implemented on a custom ATCA compliant blade, called the Cluster On Board (COB). The core of the system is the Reconfigurable Cluster Element (RCE), which is a system-on-chip design based upon the Xilinx Zynq family of FPGAs, mounted on custom COB daughter-boards. The Zynq architecture couples a dual core ARM Cortex A9 based processor with a high performance 28nm FPGA. The RCE has 12 external general purpose bi-directional high speed links, each supporting serial rates of up to 12Gbps. 8 RCE nodes are included on a COB, each with a 10Gbps connection to an on-board 24-port Ethernet switch integrated circuit. The COB is designed to be used with a standard full-mesh ATCA backplane allowing multiple RCE nodes to be tightly interconnected with minimal interconnect latency. Multiple shelves can be clustered using the front panel 10-gbps connections. The COB also supports local and inter-blade timing and trigger distribution. An experiment specific Rear Transition Module adapts the 96 high speed serial links to specific experiments and allows an experiment-specific timing and busy feedback connection. This coupling of processors with a high performance FPGA fabric in a low latency, multiple node cluster allows high speed data processing that can be easily adapted to any physics experiment. RTEMS and Linux are both ported to the module. The RCE has been used or is the baseline for several current and proposed experiments (LCLS, HPS, LSST, ATLAS-CSC, LBNE, DarkSide, ILC-SiD, etc).
Directory of Open Access Journals (Sweden)
Noha Negm
2013-06-01
Full Text Available Document Clustering is one of the main themes in text mining. It refers to the process of grouping documents with similar contents or topics into clusters to improve both availability and reliability of text mining applications. Some of the recent algorithms address the problem of high dimensionality of the text by using frequent termsets for clustering. Although the drawbacks of the Apriori algorithm, it still the basic algorithm for mining frequent termsets. This paper presents an approach for Clustering Web Documents based on Hashing algorithm for mining Frequent Termsets (CWDHFT. It introduces an efficient Multi-Tire Hashing algorithm for mining Frequent Termsets (MTHFT instead of Apriori algorithm. The algorithm uses new methodology for generating frequent termsets by building the multi-tire hash table during the scanning process of documents only one time. To avoid hash collision, Multi Tire technique is utilized in this proposed hashing algorithm. Based on the generated frequent termset the documents are partitioned and the clustering occurs by grouping the partitions through the descriptive keywords. By using MTHFT algorithm, the scanning cost and computational cost is improved moreover the performance is considerably increased and increase up the clustering process. The CWDHFT approach improved accuracy, scalability and efficiency when compared with existing clustering algorithms like Bisecting K-means and FIHC.
HYBRID APPROACH FOR OPTIMAL CLUSTER HEAD SELECTION IN WSN USING LEACH AND MONKEY SEARCH ALGORITHMS
Directory of Open Access Journals (Sweden)
T. SHANKAR
2017-02-01
Full Text Available Wireless Sensor Networks (WSNs are being widely used with low-cost, lowpower, multifunction sensors based on the development of wireless communication, which has enabled a wide variety of new applications. In WSN, the main concern is that it contains a limited power battery and is constrained in energy consumption hence energy and lifetime are of paramount importance. To achieve high energy efficiency and prolong network lifetime in WSNs, clustering techniques have been widely adopted. The proposed algorithm is hybridization of well-known Low-Energy Adaptive Clustering Hierarchy (LEACH algorithm with a distinctive Monkey Search (MS algorithm, which is an optimization algorithm used for optimal cluster head selection. The proposed hybrid algorithm exhibit high throughput, residual energy and improved lifetime. Comparison of the proposed hybrid algorithm is made with the well-known cluster-based protocols for WSNs, namely, LEACH and monkey search algorithm, individually.
A Novel Distributed Clustering Algorithm for Mobile Ad-hoc Networks
Directory of Open Access Journals (Sweden)
Sahar Adabi
2008-01-01
Full Text Available This paper proposed a new Distributed Score Based Clustering Algorithm (DSBCA for Mobile Ad-hoc Networks (MANETs.In MANETs, select suitable nodes in clusters as cluster heads are so important. The proposed Clustering Algorithm considers the Battery Remaining, Number of Neighbors, Number of Members, and Stability in order to calculate the node's score with a linear algorithm. After each node calculates its score independently, the neighbors of the node must be notified about it. Also each node selects one of its neighbors with the highest score to be its cluster head and, therefore the selection of cluster heads is performed in a distributed manner with most recent information about current status of neighbor nodes. The proposed algorithm was compared with Weighted Clustering Algorithm and Distributed Weighted Clustering Algorithm in terms of number of clusters, number of re-affiliations, lifespan of nodes in the system, end-to-end throughput and overhead. The simulation results proved that the proposed algorithm has achieved the goals.
User-Based Document Clustering by Redescribing Subject Descriptions with a Genetic Algorithm.
Gordon, Michael D.
1991-01-01
Discussion of clustering of documents and queries in information retrieval systems focuses on the use of a genetic algorithm to adapt subject descriptions so that documents become more effective in matching relevant queries. Various types of clustering are explained, and simulation experiments used to test the genetic algorithm are described. (27…
Contributions to "k"-Means Clustering and Regression via Classification Algorithms
Salman, Raied
2012-01-01
The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…
A Cluster Algorithm for the 2-D SU(3) × SU(3) Chiral Model
Ji, Da-ren; Zhang, Jian-bo
1996-07-01
To extend the cluster algorithm to SU(N) × SU(N) chiral models, a variant version of Wolff's cluster algorithm is proposed and tested for the 2-dimensional SU(3) × SU(3) chiral model. The results show that the new method can reduce the critical slowing down in SU(3) × SU(3) chiral model.
Lowest-ID with Adaptive ID Reassignment: A Novel Mobile Ad-Hoc Networks Clustering Algorithm
Gavalas, Damianos; Konstantopoulos, Charalampos; Mamalis, Basilis
2011-01-01
Clustering is a promising approach for building hierarchies and simplifying the routing process in mobile ad-hoc network environments. The main objective of clustering is to identify suitable node representatives, i.e. cluster heads (CHs), to store routing and topology information and maximize clusters stability. Traditional clustering algorithms suggest CH election exclusively based on node IDs or location information and involve frequent broadcasting of control packets, even when network topology remains unchanged. More recent works take into account additional metrics (such as energy and mobility) and optimize initial clustering. However, in many situations (e.g. in relatively static topologies) re-clustering procedure is hardly ever invoked; hence initially elected CHs soon reach battery exhaustion. Herein, we introduce an efficient distributed clustering algorithm that uses both mobility and energy metrics to provide stable cluster formations. CHs are initially elected based on the time and cost-efficien...
Use of general purpose graphics processing units with MODFLOW.
Hughes, Joseph D; White, Jeremy T
2013-01-01
To evaluate the use of general-purpose graphics processing units (GPGPUs) to improve the performance of MODFLOW, an unstructured preconditioned conjugate gradient (UPCG) solver has been developed. The UPCG solver uses a compressed sparse row storage scheme and includes Jacobi, zero fill-in incomplete, and modified-incomplete lower-upper (LU) factorization, and generalized least-squares polynomial preconditioners. The UPCG solver also includes options for sequential and parallel solution on the central processing unit (CPU) using OpenMP. For simulations utilizing the GPGPU, all basic linear algebra operations are performed on the GPGPU; memory copies between the central processing unit CPU and GPCPU occur prior to the first iteration of the UPCG solver and after satisfying head and flow criteria or exceeding a maximum number of iterations. The efficiency of the UPCG solver for GPGPU and CPU solutions is benchmarked using simulations of a synthetic, heterogeneous unconfined aquifer with tens of thousands to millions of active grid cells. Testing indicates GPGPU speedups on the order of 2 to 8, relative to the standard MODFLOW preconditioned conjugate gradient (PCG) solver, can be achieved when (1) memory copies between the CPU and GPGPU are optimized, (2) the percentage of time performing memory copies between the CPU and GPGPU is small relative to the calculation time, (3) high-performance GPGPU cards are utilized, and (4) CPU-GPGPU combinations are used to execute sequential operations that are difficult to parallelize. Furthermore, UPCG solver testing indicates GPGPU speedups exceed parallel CPU speedups achieved using OpenMP on multicore CPUs for preconditioners that can be easily parallelized. Published 2013. This article is a U.S. Government work and is in the public domain in the USA.
Combinatorial Clustering Algorithm of Quantum-Behaved Particle Swarm Optimization and Cloud Model
Directory of Open Access Journals (Sweden)
Mi-Yuan Shan
2013-01-01
Full Text Available We propose a combinatorial clustering algorithm of cloud model and quantum-behaved particle swarm optimization (COCQPSO to solve the stochastic problem. The algorithm employs a novel probability model as well as a permutation-based local search method. We are setting the parameters of COCQPSO based on the design of experiment. In the comprehensive computational study, we scrutinize the performance of COCQPSO on a set of widely used benchmark instances. By benchmarking combinatorial clustering algorithm with state-of-the-art algorithms, we can show that its performance compares very favorably. The fuzzy combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization (FCOCQPSO in vague sets (IVSs is more expressive than the other fuzzy sets. Finally, numerical examples show the clustering effectiveness of COCQPSO and FCOCQPSO clustering algorithms which are extremely remarkable.
A Heuristic Clustering Algorithm for Intrusion Detection Based on Information Entropy
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
This paper studied on the clustering problem for intrusion detection with the theory of information entropy, it was put forward that the clustering problem for exact intrusion detection based on information entropy is NP-complete, therefore, the heuristic algorithm to solve the clustering problem for intrusion detection was designed, this algorithm has the characteristic of incremental development, it can deal with the database with large connection records from the internet.
A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters
Wang, Zhihao; Yi, Jing
2016-01-01
For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291
A Cluster Maintenance Algorithm Based on Relative Mobility for Mobile Ad Hoc Network Management
Institute of Scientific and Technical Information of China (English)
SHENZhong; CHANGYilin; ZHANGXin
2005-01-01
The dynamic topology of mobile ad hoc networks makes network management significantly more challenging than wireline networks. The traditional Client/Server (Manager/Agent) management paradigm could not work well in such a dynamic environment, while the hierarchical network management architecture based on clustering is more feasible. Although the movement of nodes makes the cluster structure changeable and introduces new challenges for network management, the mobility is a relative concept. A node with high relative mobility is more prone to unstable behavior than a node with less relative mobility, thus the relative mobility of a node can be used to predict future node behavior. This paper presents the cluster availability which provides a quantitative measurement of cluster stability. Furthermore, a cluster maintenance algorithm based on cluster availability is proposed. The simulation results show that, compared to the Minimum ID clustering algorithm, our algorithm successfully alleviates the influence caused by node mobility and make the network management more efficient.
Parallelization of the Wolff single-cluster algorithm
Kaupužs, J.; Rimšāns, J.; Melnik, R. V. N.
2010-02-01
A parallel [open multiprocessing (OpenMP)] implementation of the Wolff single-cluster algorithm has been developed and tested for the three-dimensional (3D) Ising model. The developed procedure is generalizable to other lattice spin models and its effectiveness depends on the specific application at hand. The applicability of the developed methodology is discussed in the context of the applications, where a sophisticated shuffling scheme is used to generate pseudorandom numbers of high quality, and an iterative method is applied to find the critical temperature of the 3D Ising model with a great accuracy. For the lattice with linear size L=1024 , we have reached the speedup about 1.79 times on two processors and about 2.67 times on four processors, as compared to the serial code. According to our estimation, the speedup about three times on four processors is reachable for the O(n) models with n≥2 . Furthermore, the application of the developed OpenMP code allows us to simulate larger lattices due to greater operative (shared) memory available.
Using Clustering Algorithms to Identify Brown Dwarf Characteristics
Choban, Caleb
2016-06-01
Brown dwarfs are stars that are not massive enough to sustain core hydrogen fusion, and thus fade and cool over time. The molecular composition of brown dwarf atmospheres can be determined by observing absorption features in their infrared spectrum, which can be quantified using spectral indices. Comparing these indices to one another, we can determine what kind of brown dwarf it is, and if it is young or metal-poor. We explored a new method for identifying these subgroups through the expectation-maximization machine learning clustering algorithm, which provides a quantitative and statistical way of identifying index pairs which separate rare populations. We specifically quantified two statistics, completeness and concentration, to identify the best index pairs. Starting with a training set, we defined selection regions for young, metal-poor and binary brown dwarfs, and tested these on a large sample of L dwarfs. We present the results of this analysis, and demonstrate that new objects in these classes can be found through these methods.
A multi-sequential number-theoretic optimization algorithm using clustering methods
Institute of Scientific and Technical Information of China (English)
XU Qing-song; LIANG Yi-zeng; HOU Zhen-ting
2005-01-01
A multi-sequential number-theoretic optimization method based on clustering was developed and applied to the optimization of functions with many local extrema. Details of the procedure to generate the clusters and the sequential schedules were given. The algorithm was assessed by comparing its performance with generalized simulated annealing algorithm in a difficult instructive example and a D-optimum experimental design problem. It is shown the presented algorithm to be more effective and reliable based on the two examples.
Comparison and evaluation of network clustering algorithms applied to genetic interaction networks.
Hou, Lin; Wang, Lin; Berg, Arthur; Qian, Minping; Zhu, Yunping; Li, Fangting; Deng, Minghua
2012-01-01
The goal of network clustering algorithms detect dense clusters in a network, and provide a first step towards the understanding of large scale biological networks. With numerous recent advances in biotechnologies, large-scale genetic interactions are widely available, but there is a limited understanding of which clustering algorithms may be most effective. In order to address this problem, we conducted a systematic study to compare and evaluate six clustering algorithms in analyzing genetic interaction networks, and investigated influencing factors in choosing algorithms. The algorithms considered in this comparison include hierarchical clustering, topological overlap matrix, bi-clustering, Markov clustering, Bayesian discriminant analysis based community detection, and variational Bayes approach to modularity. Both experimentally identified and synthetically constructed networks were used in this comparison. The accuracy of the algorithms is measured by the Jaccard index in comparing predicted gene modules with benchmark gene sets. The results suggest that the choice differs according to the network topology and evaluation criteria. Hierarchical clustering showed to be best at predicting protein complexes; Bayesian discriminant analysis based community detection proved best under epistatic miniarray profile (EMAP) datasets; the variational Bayes approach to modularity was noticeably better than the other algorithms in the genome-scale networks.
Sonar Image Detection Algorithm Based on Two-Phase Manifold Partner Clustering
Institute of Scientific and Technical Information of China (English)
Xingmei Wang; Zhipeng Liu; Jianchuang Sun; Shu Liu
2015-01-01
According to the characteristics of sonar image data with manifold feature, the sonar image detection method based on two⁃phase manifold partner clustering algorithm is proposed. Firstly, K⁃means block clustering based on euclidean distance is proposed to reduce the data set. Mean value, standard deviation, and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K⁃means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K⁃means clustering algorithm based on manifold distance, line segment length on the manifold is analyzed, and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance, new all⁃source shortest path as the pretreatment of efficient algorithm is proposed. Based on this, the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.
PERFORMANCE OF K-MEANS CLUSTERING AND BIRD FLOCKING ALGORITHM FOR GROUPING THE WEB LOG FILES
Directory of Open Access Journals (Sweden)
R. SUGUNA
2012-10-01
Full Text Available Data mining is the process of analyzing the interesting pattern and knowledge in different perspectives and summarizing it into useful information from the large amount of data. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. The unlabled vast amount of data can be grouped using clustering or classification algorithms. Cluster analysis or clustering is the task of assigning a set of objects into groups called clusters. So, the objects in the same cluster are more similar to each other than to those in other clusters. Many of the researchers evaluated the performance of thefamiliar K-means clustering algorithm and attempt to improve the efficiency of the algorithm. This paper will analyze the performance of the K-means clustering algorithm with the biological based algorithm called Bird flocking algorithm for grouping the web logs. Web logs are unformatted text files which contains the information regarding the user’s browser detail. The proposed system takes the input as web log files and groups the web sites based on the interesting rate of the users. The performance is evaluated in terms of no of clusters, CPU utilization time and accuracy.
Improved FIFO Scheduling Algorithm Based on Fuzzy Clustering in Cloud Computing
Directory of Open Access Journals (Sweden)
Jian Li
2017-02-01
Full Text Available In cloud computing, some large tasks may occupy too many resources and some small tasks may wait for a long time based on First-In-First-Out (FIFO scheduling algorithm. To reduce tasks’ waiting time, we propose a task scheduling algorithm based on fuzzy clustering algorithms. We construct a task model, resource model, and analyze tasks’ preference, then classify resources with fuzzy clustering algorithms. Based on the parameters of cloud tasks, the algorithm will calculate resource expectation and assign tasks to different resource clusters, so the complexity of resource selection will be decreased. As a result, the algorithm will reduce tasks’ waiting time and improve the resource utilization. The experiment results show that the proposed algorithm shortens the execution time of tasks and increases the resource utilization.
CLOUDCLOUD : general-purpose instrument monitoring and data managing software
Dias, António; Amorim, António; Tomé, António
2016-04-01
An effective experiment is dependent on the ability to store and deliver data and information to all participant parties regardless of their degree of involvement in the specific parts that make the experiment a whole. Having fast, efficient and ubiquitous access to data will increase visibility and discussion, such that the outcome will have already been reviewed several times, strengthening the conclusions. The CLOUD project aims at providing users with a general purpose data acquisition, management and instrument monitoring platform that is fast, easy to use, lightweight and accessible to all participants of an experiment. This work is now implemented in the CLOUD experiment at CERN and will be fully integrated with the experiment as of 2016. Despite being used in an experiment of the scale of CLOUD, this software can also be used in any size of experiment or monitoring station, from single computers to large networks of computers to monitor any sort of instrument output without influencing the individual instrument's DAQ. Instrument data and meta data is stored and accessed via a specially designed database architecture and any type of instrument output is accepted using our continuously growing parsing application. Multiple databases can be used to separate different data taking periods or a single database can be used if for instance an experiment is continuous. A simple web-based application gives the user total control over the monitored instruments and their data, allowing data visualization and download, upload of processed data and the ability to edit existing instruments or add new instruments to the experiment. When in a network, new computers are immediately recognized and added to the system and are able to monitor instruments connected to them. Automatic computer integration is achieved by a locally running python-based parsing agent that communicates with a main server application guaranteeing that all instruments assigned to that computer are
SPIDR, a general-purpose readout system for pixel ASICs
van der Heijden, B.; Visser, J.; van Beuzekom, M.; Boterenbrood, H.; Kulis, S.; Munneke, B.; Schreuder, F.
2017-02-01
The SPIDR (Speedy PIxel Detector Readout) system is a flexible general-purpose readout platform that can be easily adapted to test and characterize new and existing detector readout ASICs. It is originally designed for the readout of pixel ASICs from the Medipix/Timepix family, but other types of ASICs or front-end circuits can be read out as well. The SPIDR system consists of an FPGA board with memory and various communication interfaces, FPGA firmware, CPU subsystem and an API library on the PC . The FPGA firmware can be adapted to read out other ASICs by re-using IP blocks. The available IP blocks include a UDP packet builder, 1 and 10 Gigabit Ethernet MAC's and a "soft core" CPU . Currently the firmware is targeted at the Xilinx VC707 development board and at a custom board called Compact-SPIDR . The firmware can easily be ported to other Xilinx 7 series and ultra scale FPGAs. The gap between an ASIC and the data acquisition back-end is bridged by the SPIDR system. Using the high pin count VITA 57 FPGA Mezzanine Card (FMC) connector only a simple chip carrier PCB is required. A 1 and a 10 Gigabit Ethernet interface handle the connection to the back-end. These can be used simultaneously for high-speed data and configuration over separate channels. In addition to the FMC connector, configurable inputs and outputs are available for synchronization with other detectors. A high resolution (≈ 27 ps bin size) Time to Digital converter is provided for time stamping events in the detector. The SPIDR system is frequently used as readout for the Medipix3 and Timepix3 ASICs. Using the 10 Gigabit Ethernet interface it is possible to read out a single chip at full bandwidth or up to 12 chips at a reduced rate. Another recent application is the test-bed for the VeloPix ASIC, which is developed for the Vertex Detector of the LHCb experiment. In this case the SPIDR system processes the 20 Gbps scrambled data stream from the VeloPix and distributes it over four 10 Gigabit
Karayiannis, Nicolaos B; Randolph-Gips, Mary M
2005-03-01
This paper presents the development of soft clustering and learning vector quantization (LVQ) algorithms that rely on a weighted norm to measure the distance between the feature vectors and their prototypes. The development of LVQ and clustering algorithms is based on the minimization of a reformulation function under the constraint that the generalized mean of the norm weights be constant. According to the proposed formulation, the norm weights can be computed from the data in an iterative fashion together with the prototypes. An error analysis provides some guidelines for selecting the parameter involved in the definition of the generalized mean in terms of the feature variances. The algorithms produced from this formulation are easy to implement and they are almost as fast as clustering algorithms relying on the Euclidean norm. An experimental evaluation on four data sets indicates that the proposed algorithms outperform consistently clustering algorithms relying on the Euclidean norm and they are strong competitors to non-Euclidean algorithms which are computationally more demanding.
Karjee, Jyotirmoy
2011-01-01
Objective: The main objective of this paper is to construct a distributed clustering algorithm based upon spatial data correlation among sensor nodes and perform data accuracy for each distributed cluster at their respective cluster head node. Design Procedure/Approach: We investigate that due to deployment of high density of sensor nodes in the sensor field, spatial data are highly correlated among sensor nodes in spatial domain. Based on high data correlation among sensor nodes, we propose a non -overlapping irregular distributed clustering algorithm with different sizes to collect most accurate or precise data at the cluster head node for each respective distributed cluster. To collect the most accurate data at the cluster head node for each distributed cluster in sensor field, we propose a Data accuracy model and compare the results with Information accuracy model. Finding: Simulation results shows that our propose Data accuracy model collects more accurate data and gives better performance than Informati...
Wireless Meter Reading Based Energy-Balanced Steady Clustering Routing Algorithm for Sensor Networks
Directory of Open Access Journals (Sweden)
TANG, Z.
2011-05-01
Full Text Available According to the characteristics of wireless meter reading system, an energy-balanced and energy-efficient steady clustering routing algorithm (EBSC, Energy-Balanced Steady Clustering is proposed. In the clustering mechanism, the current cluster head nodes determine cluster head nodes for next round according to the residual energy of the cluster members. In the next round, each non-cluster head node decides the cluster to which it will belong according to energy-distance function. The cluster head nodes send data to base station by the communication model of single hop and multi-hop that is decided according to the criterion of minimum energy consumption. In EBSC algorithm, the number of cluster head nodes generated in each round is very steady, and EBSC combines the advantage both distributed and centralized clustering algorithm. Experimental results show that the proposed routing algorithm not only efficiently uses limited energy of network nodes, but also well balances energy consumption of all nodes, and significantly prolongs network lifetime.
Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong
2010-10-01
Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Institute of Scientific and Technical Information of China (English)
CHEN Yunkai; LU Zhengding; LI Ruixuan; LI Yuhua; SUN Xiaolin
2006-01-01
Considering the constantly increasing of data in large databases such as wire transfer database, incremental clustering algorithms play a more and more important role in Data Mining (DM). However, Few of the traditional clustering algorithms can not only handle the categorical data, but also explain its output clearly. Based on the idea of dynamic clustering, an incremental conceptive clustering algorithm is proposed in this paper. Which introduces the Semantic Core Tree (SCT) to deal with large volume of categorical wire transfer data for the detecting money laundering. In addition, the rule generation algorithm is presented here to express the clustering result by the format of knowledge. When we apply this idea in financial data mining, the efficiency of searching the characters of money laundering data will be improved.
Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji
2014-01-01
An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Directory of Open Access Journals (Sweden)
Liping Sun
2014-01-01
Full Text Available An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.
Implementation of Clustering Algorithms for real datasets in Medical Diagnostics using MATLAB
Directory of Open Access Journals (Sweden)
B. Venkataramana
2017-03-01
Full Text Available As in the medical field, for one disease there require samples given by diagnosis. The samples will be analyzed by a doctor or a pharmacist. As the no. of patients increases their samples also increases, there require more time to analyze samples for deciding the stage of the disease. To analyze the sample every time requires a skilled person. The samples can be classified by applying them to clustering algorithms. Data clustering has been considered as the most important raw data analysis method used in data mining technology. Most of the clustering techniques proved their efficiency in many applications such as decision making systems, medical sciences, earth sciences etc. Partition based clustering is one of the main approach in clustering. There are various algorithms of data clustering, every algorithm has its own advantages and disadvantages. This work reports the results of classification performance of three such widely used algorithms namely K-means (KM, Fuzzy c-means and Fuzzy Possibilistic c-Means (FPCM clustering algorithms. To analyze these algorithms three known data sets from UCI machine learning repository are taken such as thyroid data, liver and wine. The efficiency of clustering output is compared with the classification performance, percentage of correctness. The experimental results show that K-means and FCM give same performance for liver data. And FCM and FPCM are giving same performance for thyroid and wine data. FPCM has more efficient classification performance in all the given data sets.
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
Frisca, Bustamam, Alhadi; Siswantining, Titin
2017-03-01
Clustering is one of data analysis methods that aims to classify data which have similar characteristics in the same group. Spectral clustering is one of the most popular modern clustering algorithms. As an effective clustering technique, spectral clustering method emerged from the concepts of spectral graph theory. Spectral clustering method needs partitioning algorithm. There are some partitioning methods including PAM, SOM, Fuzzy c-means, and k-means. Based on the research that has been done by Capital and Choudhury in 2013, when using Euclidian distance k-means algorithm provide better accuracy than PAM algorithm. So in this paper we use k-means as our partition algorithm. The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Microarray data is a small-sized chip made of a glass plate containing thousands and even tens of thousands kinds of genes in the DNA fragments derived from doubling cDNA. Application of microarray data is widely used to detect cancer, for the example is carcinoma, in which cancer cells express the abnormalities in his genes. The purpose of this research is to classify the data that have high similarity in the same group and the data that have low similarity in the others. In this research, Carcinoma microarray data using 7457 genes. The result of partitioning using k-means algorithm is two clusters.
Comparing the biological coherence of network clusters identified by different detection algorithms
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Protein-protein interaction networks serve to carry out basic molecular activity in the cell. Detecting the modular structures from the protein-protein interaction network is important for understanding the organization, function and dynamics of a biological system. In order to identify functional neighborhoods based on network topology, many network cluster identification algorithms have been developed. However, each algorithm might dissect a network from a different aspect and may provide different insight on the network partition. In order to objectively evaluate the performance of four commonly used cluster detection algorithms: molecular complex detection (MCODE), NetworkBlast, shortest-distance clustering (SDC) and Girvan-Newman (G-N) algorithm, we compared the biological coherence of the network clusters found by these algorithms through a uniform evaluation framework. Each algorithm was utilized to find network clusters in two different protein-protein interaction networks with various parameters. Comparison of the resulting network clusters indicates that clusters found by MCODE and SDC are of higher biological coherence than those by NetworkBlast and G-N algorithm.
Genetic algorithm based two-mode clustering of metabolomics data
Hageman, J.A.; Berg, R.A. van den; Westerhuis, J.A.; Werf, M.J. van der; Smilde, A.K.
2008-01-01
Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering metho
Text clustering based on fusion of ant colony and genetic algorithms
Institute of Scientific and Technical Information of China (English)
Yun ZHANG; Boqin FENG; Shouqiang MA; Lianmeng LIU
2009-01-01
Focusing on the problem that the ant colony algorithm gets into stagnation easily and cannot fully search in solution space,a text clustering approach based on the fusion of the ant colony and genetic algorithms is proposed.The four parameters that influence the performance of the ant colony algorithm are encoded as chromosomes,thereby the fitness function,selection,crossover and mutation operator are designed to find the combination of optimal parameters through a number of iteration,and then it is applied to text clustering.The simulation.results show that compared with the classical k-means clustering and the basic ant colony clustering algorithm,the proposed algorithm has better performance and the value of F-Measure is enhanced by 5.69%,48.60% and 69.60%,respectively,in 3 test datasets.Therefore,it is more suitable for processing a larger dataset.
A highly efficient multi-core algorithm for clustering extremely large datasets
Directory of Open Access Journals (Sweden)
Kraus Johann M
2010-04-01
Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.
Directory of Open Access Journals (Sweden)
Simon Fong
2014-01-01
Full Text Available Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
A New Cooperative Algorithm Based on PSO and K-Means for Data Clustering
Directory of Open Access Journals (Sweden)
Mehdi Sargolzaei
2012-01-01
Full Text Available Problem statement: Data clustering has been applied in multiple fields such as machine learning, data mining, wireless sensor networks and pattern recognition. One of the most famous clustering approaches is K-means which effectively has been used in many clustering problems, but this algorithm has some drawbacks such as local optimal convergence and sensitivity to initial points. Approach: Particle Swarm Optimization (PSO algorithm is one of the swarm intelligence algorithms, which is applied in determining the optimal cluster centers. In this study, a cooperative algorithm based on PSO and k-means is presented. Result: The proposed algorithm utilizes both global search ability of PSO and local search ability of k-means. The proposed algorithm and also PSO, PSO with Contraction Factor (CF-PSO, k-means algorithms and KPSO hybrid algorithm have been used for clustering six datasets and their efficiencies are compared with each other. Conclusion: Experimental results show that the proposed algorithm has an acceptable efficiency and robustness.
Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan
2014-01-01
Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.
An Efficient Data Aggregation Algorithm for Cluster-based Sensor Network
Directory of Open Access Journals (Sweden)
Mohammad Mostafizur Rahman Mozumdar
2009-09-01
Full Text Available Data aggregation in wireless sensor networks eliminates redundancy to improve bandwidth utilization and energyefficiency of sensor nodes. One node, called the cluster leader, collects data from surrounding nodes and then sends the summarized information to upstream nodes. In this paper, we propose an algorithm to select a cluster leader that will perform data aggregation in a partially connected sensor network. The algorithm reduces the traffic flow inside the network by adaptively selecting the shortest route for packet routing to the cluster leader. We also describe a simulation framework for functional analysis of WSN applications taking our proposed algorithm as an example.
Institute of Scientific and Technical Information of China (English)
CHUShuchuan; JohnF.Roddick
2003-01-01
In this paper, a cluster generation algorithm for vector quantization using a tabu search approach with simulated annealing is proposed. The main iclea of this algorithm is to use the tabu search approach to gen-erate non-local moves for the clusters and apply the sim-ulated annealing technique to select the current best solu-tion, thus improving the cluster generation and reducing the mean squared error. Preliminary experimental results demonstrate that the proposed approach is superior to the tabu search approach with Generalised Lloyd algorithm.
Scaling up the DBSCAN Algorithm for Clustering Large Spatial Databases Based on Sampling Technique
Institute of Scientific and Technical Information of China (English)
无
2001-01-01
Clustering, in data mining, is a useful technique for discoveringinte resting data distributions and patterns in the underlying data, and has many app lication fields, such as statistical data analysis, pattern recognition, image p rocessing, and etc. We combine sampling technique with DBSCAN alg orithm to cluster large spatial databases, and two sampling-based DBSCAN (SDBSC A N) algorithms are developed. One algorithm introduces sampling technique inside DBSCAN, and the other uses sampling procedure outside DBSCAN. Experimental resul ts demonstrate that our algorithms are effective and efficient in clustering lar ge-scale spatial databases.
K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data
Directory of Open Access Journals (Sweden)
Cheng Lu
2015-01-01
Full Text Available The Affinity Propagation (AP algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based on K-nearest neighbor intervals (KNNI for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.
Hierarchical trie packet classification algorithm based on expectation-maximization clustering
Bi, Xia-an; Zhao, Junxia
2017-01-01
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm. PMID:28704476
Clustering dynamic textures with the hierarchical em algorithm for modeling video.
Mumtaz, Adeel; Coviello, Emanuele; Lanckriet, Gert R G; Chan, Antoni B
2013-07-01
Dynamic texture (DT) is a probabilistic generative model, defined over space and time, that represents a video as the output of a linear dynamical system (LDS). The DT model has been applied to a wide variety of computer vision problems, such as motion segmentation, motion classification, and video registration. In this paper, we derive a new algorithm for clustering DT models that is based on the hierarchical EM algorithm. The proposed clustering algorithm is capable of both clustering DTs and learning novel DT cluster centers that are representative of the cluster members in a manner that is consistent with the underlying generative probabilistic model of the DT. We also derive an efficient recursive algorithm for sensitivity analysis of the discrete-time Kalman smoothing filter, which is used as the basis for computing expectations in the E-step of the HEM algorithm. Finally, we demonstrate the efficacy of the clustering algorithm on several applications in motion analysis, including hierarchical motion clustering, semantic motion annotation, and learning bag-of-systems (BoS) codebooks for dynamic texture recognition.
Anandakrishnan, Ramu; Onufriev, Alexey
2008-03-01
In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.
Zhang, Xianchao; Liu, Han; Zhang, Xiaotong
2017-09-01
Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing
Constructing a graph of connections in clustering algorithm of complex objects
Directory of Open Access Journals (Sweden)
Татьяна Шатовская
2015-05-01
Full Text Available The article describes the results of modifying the algorithm Chameleon. Hierarchical multi-level algorithm consists of several phases: the construction of the count, coarsening, the separation and recovery. Each phase can be used various approaches and algorithms. The main aim of the work is to study the quality of the clustering of different sets of data using a set of algorithms combinations at different stages of the algorithm and improve the stage of construction by the optimization algorithm of k choice in the graph construction of k of nearest neighbors
A scalable and practical one-pass clustering algorithm for recommender system
Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali
2015-12-01
KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.
Two Parallel Swendsen-Wang Cluster Algorithms Using Message-Passing Paradigm
Lin, Shizeng
2008-01-01
In this article, we present two different parallel Swendsen-Wang Cluster(SWC) algorithms using message-passing interface(MPI). One is based on Master-Slave Parallel Model(MSPM) and the other is based on Data-Parallel Model(DPM). A speedup of 24 with 40 processors and 16 with 37 processors is achieved with the DPM and MSPM respectively. The speedup of both algorithms at different temperature and system size is carefully examined both experimentally and theoretically, and a comparison of their efficiency is made. In the last section, based on these two parallel SWC algorithms, two parallel probability changing cluster(PCC) algorithms are proposed.
Energy Technology Data Exchange (ETDEWEB)
Loughry, Thomas A.
2015-02-01
As the volume of data acquired by space-based sensors increases, mission data compression/decompression and forward error correction code processing performance must likewise scale. This competency development effort was explored using the General Purpose Graphics Processing Unit (GPGPU) to accomplish high-rate Rice Decompression and high-rate Reed-Solomon (RS) decoding at the satellite mission ground station. Each algorithm was implemented and benchmarked on a single GPGPU. Distributed processing across one to four GPGPUs was also investigated. The results show that the GPGPU has considerable potential for performing satellite communication Data Signal Processing, with three times or better performance improvements and up to ten times reduction in cost over custom hardware, at least in the case of Rice Decompression and Reed-Solomon Decoding.
Upgrade of the Cellular General Purpose Monte Carlo Tool FOAM to version 2.06
Jadach, Stanislaw
2006-01-01
FOAM-2.06 is an upgraded version of FOAM, a general purpose, self-adapting Monte Carlo event generator. In comparison with FOAM-2.05, it has two important improvements. New interface to random numbers lets the user to choose from the three "state of the art" random number generators. Improved algorithms for simplical grid need less computer memory; the problem of the prohibitively large memory allocation required for the large number ($>10^6$) of simplical cells is now eliminated -- the new version can handle such cases even on the average desktop computers. In addition, generation of the Monte Carlo events, in case of large number of cells, may be even significantly faster.
Energy Technology Data Exchange (ETDEWEB)
Loughry, Thomas A. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2015-02-01
As the volume of data acquired by space-based sensors increases, mission data compression/decompression and forward error correction code processing performance must likewise scale. This competency development effort was explored using the General Purpose Graphics Processing Unit (GPGPU) to accomplish high-rate Rice Decompression and high-rate Reed-Solomon (RS) decoding at the satellite mission ground station. Each algorithm was implemented and benchmarked on a single GPGPU. Distributed processing across one to four GPGPUs was also investigated. The results show that the GPGPU has considerable potential for performing satellite communication Data Signal Processing, with three times or better performance improvements and up to ten times reduction in cost over custom hardware, at least in the case of Rice Decompression and Reed-Solomon Decoding.
A Parallel General Purpose Mulit-Objective Optimization Framework, with Application to Beam Dynamics
Ineichen, Y; Kolano, A; Bekas, C; Curioni, A; Arbenz, P
2013-01-01
Particle accelerators are invaluable tools for research in the basic and applied sciences, in fields such as materials science, chemistry, the biosciences, particle physics, nuclear physics and medicine. The design, commissioning, and operation of accelerator facilities is a non-trivial task, due to the large number of control parameters and the complex interplay of several conflicting design goals. We propose to tackle this problem by means of multi-objective optimization algorithms which also facilitate a parallel deployment. In order to compute solutions in a meaningful time frame we require a fast and scalable software framework. In this paper, we present the implementation of such a general-purpose framework for simulation based multi-objective optimization methods that allows the automatic investigation of optimal sets of machine parameters. The implementation is based on a master/slave paradigm, employing several masters that govern a set of slaves executing simulations and performing optimization task...
Directory of Open Access Journals (Sweden)
Bohui Zhu
2013-01-01
Full Text Available This paper presents a novel maximum margin clustering method with immune evolution (IEMMC for automatic diagnosis of electrocardiogram (ECG arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias.
A fast SVM training algorithm based on the set segmentation and k-means clustering
Institute of Scientific and Technical Information of China (English)
YANG Xiaowei; LIN Daying; HAO Zhifeng; LIANG Yanchun; LIU Guirong; HAN Xu
2003-01-01
At present, studies on training algorithms for support vector machines (SVM) are important issues in the field of machine learning. It is a challenging task to improve the efficiency of the algorithm without reducing the generalization performance of SVM. To face this challenge, a new SVM training algorithm based on the set segmentation and k-means clustering is presented in this paper. The new idea is to divide all the original training data into many subsets, followed by clustering each subset using k-means clustering and finally train SVM using the new data set obtained from clustering centroids. Considering that the decomposition algorithm such as SVMlight is one of the major methods for solving support vector machines, the SVMlight is used in our experiments. Simulations on different types of problems show that the proposed method can solve efficiently not only large linear classification problems but also large nonlinear ones.
Scheduling algorithm of dual-armed cluster tools with residency time and reentrant constraints
Institute of Scientific and Technical Information of China (English)
周炳海; 高忠顺; 陈佳
2014-01-01
To solve the scheduling problem of dual-armed cluster tools for wafer fabrications with residency time and reentrant constraints, a heuristic scheduling algorithm was developed. Firstly, on the basis of formulating scheduling problems domain of dual-armed cluster tools, a non-integer programming model was set up with a minimizing objective function of the makespan. Combining characteristics of residency time and reentrant constraints, a scheduling algorithm of searching the optimal operation path of dual-armed transport module was presented under many kinds of robotic scheduling paths for dual-armed cluster tools. Finally, the experiments were designed to evaluate the proposed algorithm. The results show that the proposed algorithm is feasible and efficient for obtaining an optimal scheduling solution of dual-armed cluster tools with residency time and reentrant constraints.
A Load Balancing Algorithm Based on Maximum Entropy Methods in Homogeneous Clusters
Directory of Open Access Journals (Sweden)
Long Chen
2014-10-01
Full Text Available In order to solve the problems of ill-balanced task allocation, long response time, low throughput rate and poor performance when the cluster system is assigning tasks, we introduce the concept of entropy in thermodynamics into load balancing algorithms. This paper proposes a new load balancing algorithm for homogeneous clusters based on the Maximum Entropy Method (MEM. By calculating the entropy of the system and using the maximum entropy principle to ensure that each scheduling and migration is performed following the increasing tendency of the entropy, the system can achieve the load balancing status as soon as possible, shorten the task execution time and enable high performance. The result of simulation experiments show that this algorithm is more advanced when it comes to the time and extent of the load balance of the homogeneous cluster system compared with traditional algorithms. It also provides novel thoughts of solutions for the load balancing problem of the homogeneous cluster system.
Solving the Capacitated Vehicle Routing Problem Based on Improved Ant-clustering Algorithm
Directory of Open Access Journals (Sweden)
Zhang Jiashan
2015-01-01
Full Text Available The capacitated vehicle routing problems (CVRP are NP-hard. Most approaches can solve small-scale case studies to optimality. Furthermore, they are time-consuming. To overcome the limitation, this paper presents a novel three-phase heuristic approach for the capacitated vehicle routing problem. The first phase aims to identify sets of cost-effective feasible clusters through an improved ant-clustering algorithm, in which the adaptive strategy is adopted. The second phase assigns clusters to vehicles and sequences them on each tour. The third phase orders nodes within clusters for every tour and genetic algorithm is used to order nodes within clusters. The simulation indicates the algorithm attains high quality results in a short time.
Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification
Directory of Open Access Journals (Sweden)
Yu Feng
2017-01-01
Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.
Optimization of self-interstitial clusters in 3C-SiC with genetic algorithm
Ko, Hyunseok; Kaczmarowski, Amy; Szlufarska, Izabela; Morgan, Dane
2017-08-01
Under irradiation, SiC develops damage commonly referred to as black spot defects, which are speculated to be self-interstitial atom clusters. To understand the evolution of these defect clusters and their impacts (e.g., through radiation induced swelling) on the performance of SiC in nuclear applications, it is important to identify the cluster composition, structure, and shape. In this work the genetic algorithm code StructOpt was utilized to identify groundstate cluster structures in 3C-SiC. The genetic algorithm was used to explore clusters of up to ∼30 interstitials of C-only, Si-only, and Si-C mixtures embedded in the SiC lattice. We performed the structure search using Hamiltonians from both density functional theory and empirical potentials. The thermodynamic stability of clusters was investigated in terms of their composition (with a focus on Si-only, C-only, and stoichiometric) and shape (spherical vs. planar), as a function of the cluster size (n). Our results suggest that large Si-only clusters are likely unstable, and clusters are predominantly C-only for n ≤ 10 and stoichiometric for n > 10. The results imply that there is an evolution of the shape of the most stable clusters, where small clusters are stable in more spherical geometries while larger clusters are stable in more planar configurations. We also provide an estimated energy vs. size relationship, E(n), for use in future analysis.
Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun
2015-01-01
Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.
Directory of Open Access Journals (Sweden)
Dong Yumin
2014-01-01
Full Text Available A quantum optimization scheme in network cluster server task scheduling is proposed. We explore and research the distribution theory of energy field in quantum mechanics; specially, we apply it to data clustering. We compare the quantum optimization method with genetic algorithm (GA, ant colony optimization (ACO, simulated annealing algorithm (SAA. At the same time, we prove its validity and rationality by analog simulation and experiment.
A Novel Image Fusion Algorithm for Visible and PMMW Images based on Clustering and NSCT
Xiong Jintao; Xie Weichao; Yang Jianyu; Fu Yanlong; Hu Kuan; Zhong Zhibin
2016-01-01
Aiming at the fusion of visible and Passive Millimeter Wave (PMMW) images, a novel algorithm based on clustering and NSCT (Nonsubsampled Contourlet Transform) is proposed. It takes advantages of the particular ability of PMMW image in presenting metal target and uses the clustering algorithm for PMMW image to extract the potential target regions. In the process of fusion, NSCT is applied to both input images, and then the decomposition coefficients on different scale are combined using differ...
Directory of Open Access Journals (Sweden)
D. A. Viattchenin
2009-01-01
Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible clusterization with partial training is proposed in the paper. The method is based on data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.
A Chinese Web Page Clustering Algorithm Based on the Suffix Tree
Institute of Scientific and Technical Information of China (English)
YANG Jian-wu
2004-01-01
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy.The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page.
A Coupled User Clustering Algorithm Based on Mixed Data for Web-Based Learning Systems
Directory of Open Access Journals (Sweden)
Ke Niu
2015-01-01
Full Text Available In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.
Semi-Supervised Clustering Fingerprint Positioning Algorithm Based on Distance Constraints
Institute of Scientific and Technical Information of China (English)
Ying Xia; Zhongzhao Zhang; Lin Ma; Yao Wang
2015-01-01
With the rapid development of WLAN ( Wireless Local Area Network ) technology, an important target of indoor positioning systems is to improve the positioning accuracy while reducing the online computation. In this paper, it proposes a novel fingerprint positioning algorithm known as semi⁃supervised affinity propagation clustering based on distance function constraints. We show that by employing affinity propagation techniques, it is able to use a fractional labeled data to adjust similarity matrix of signal space to cluster reference points with high accuracy. The semi⁃supervised APC uses a combination of machine learning, clustering analysis and fingerprinting algorithm. By collecting data and testing our algorithm in a realistic indoor WLAN environment, the experimental results indicate that the proposed algorithm can improve positioning accuracy while reduce the online localization computation, as compared with the widely used K nearest neighbor and maximum likelihood estimation algorithms.
Institute of Scientific and Technical Information of China (English)
Wu Naixing; Liao Jianxin; Zhu Xiaomin
2006-01-01
Based on the system feature of softswitch-based heterogeneous clustered media server, this paper proposed a limited resource vector load-balancing algorithm. The purpose of the algorithm was to balance the load of clusters by utilizing all system resources effectively and to avoid violent shaking of the system performance. A lot of simulations on the Petri net model of load balance system are conducted and the algorithm is compared with some traditional algorithms on balancing ability for heterogeneity, system throughput, request response time and performance stability. The results of simulations show that the algorithm achieves system higher performance and it has excellent ability to deal with the heterogeneity of clustered media server.
Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data
Institute of Scientific and Technical Information of China (English)
Haiyan Pan; Jun Zhu; Danfu Han
2003-01-01
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance of HGACLUS and other methods was compared by using simulated data and open microarray gene-expression datasets. HGACLUS was generally found to be more accurate and robust than other methods discussed in this paper by the exact validation strategy and the explicit cluster number.
Clustering Algorithm As A Planning Support Tool For Rural Electrification Optimization
Directory of Open Access Journals (Sweden)
Ronaldo Pornillosa Parreno Jr
2015-08-01
Full Text Available Abstract In this study clustering algorithm was developed to optimize electrification plans by screening and grouping potential customers to be supplied with electricity. The algorithm provided adifferent approach in clustering problem which combines conceptual and distance-based clustering algorithmsto analyze potential clusters using spanning tree with the shortest possible edge weight and creating final cluster trees based on the test of inconsistency for the edges. The clustering criteria consists of commonly used distance measure with the addition of household information as basis for the ability to pay ATP value. The combination of these two parameters resulted to a more significant and realistic clusters since distance measure alone could not take the effect of the household characteristics in screening the most sensible groupings of households. In addition the implications of varying geographical features were incorporated in the algorithm by using routing index across the locations of the households. This new approach of connecting the households in an area was applied in an actual case study of one village or barangay that was not yet energized. The results of clustering algorithm generated cluster trees which could becomethetheoretical basis for power utilities to plan the initial network arrangement of electrification. Scenario analysis conducted on the two strategies of clustering the households provideddifferent alternatives for the optimization of the cost of electrification. Futhermorethe benefits associated with the two strategies formulated from the two scenarios was evaluated using benefit cost ratio BC to determine which is more economically advantageous. The results of the study showed that clustering algorithm proved to be effective in solving electrification optimization problem and serves its purpose as a planning support tool which can facilitate electrification in rural areas and achieve cost-effectiveness.
A Hybrid Distributed Mutual Exclusion Algorithm for Cluster-Based Systems
Directory of Open Access Journals (Sweden)
Moharram Challenger
2013-01-01
Full Text Available Distributed mutual exclusion is a fundamental problem which arises in various systems such as grid computing, mobile ad hoc networks (MANETs, and distributed databases. Reducing key metrics like message count per any critical section (CS and delay between two CS entrances, which is known as synchronization delay, is a great challenge for this problem. Various algorithms use either permission-based or token-based protocols. Token-based algorithms offer better communication costs and synchronization delay. Raymond's and Suzuki-Kasami's algorithms are well-known token-based ones. Raymond's algorithm needs only O(log2(N messages per CS and Suzuki-Kasami's algorithm needs just one message delivery time between two CS entrances. Nevertheless, both algorithms are weak in the other metric, synchronization delay and message complexity correspondingly. In this work, a new hybrid algorithm is proposed which gains from powerful aspects of both algorithms. Raysuz's algorithm (the proposed algorithm uses a clustered graph and executes Suzuki-Kasami's algorithm intraclusters and Raymond's algorithm interclusters. This leads to have better message complexity than that of pure Suzuki-Kasami's algorithm and better synchronization delay than that of pure Raymond's algorithm, resulting in an overall efficient DMX algorithm pure algorithm.
An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Directory of Open Access Journals (Sweden)
Jian Zhang
2014-01-01
Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.
Clustered K nearest neighbor algorithm for daily inflow forecasting
Akbari, M.; Van Overloop, P.J.A.T.M.; Afshar, A.
2010-01-01
Instance based learning (IBL) algorithms are a common choice among data driven algorithms for inflow forecasting. They are based on the similarity principle and prediction is made by the finite number of similar neighbors. In this sense, the similarity of a query instance is estimated according to
Directory of Open Access Journals (Sweden)
Jing Chen
2015-06-01
Full Text Available This study takes the concept of food logistics distribution as the breakthrough point, by means of the aim of optimization of food logistics distribution routes and analysis of the optimization model of food logistics route, as well as the interpretation of the genetic algorithm, it discusses the optimization of food logistics distribution route based on genetic and cluster scheme algorithm.
GenClust: A genetic algorithm for clustering gene expression data
Directory of Open Access Journals (Sweden)
Raimondi Alessandra
2005-12-01
Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
GenClust: a genetic algorithm for clustering gene expression data.
Di Gesú, Vito; Giancarlo, Raffaele; Lo Bosco, Giosué; Raimondi, Alessandra; Scaturro, Davide
2005-12-07
Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.
The Loop-Cluster Algorithm for the Case of the 6 Vertex Model
Evertz, H G
1993-01-01
We present the loop algorithm, a new type of cluster algorithm that we recently introduced for the F model. Using the framework of Kandel and Domany, we show how to GENERALIZE the algorithm to the arrow flip symmetric 6 vertex model. We propose the principle of least possible freezing as the guide to choosing the values of free parameters in the algorithm. Finally, we briefly discuss the application of our algorithm to simulations of quantum spin systems. In particular, all necessary information is provided for the simulation of spin $\\half$ Heisenberg and $xxz$ models.
CMA: an efficient index algorithm of clustering supporting fast retrieval of large image databases
Institute of Scientific and Technical Information of China (English)
无
2005-01-01
To realize content-based retrieval of large image databases, it is required to develop an efficient index and retrieval scheme. This paper proposes an index algorithm of clustering called CMA, which supports fast retrieval of large image databases. CMA takes advantages of k-means and self-adaptive algorithms. It is simple and works without any user interactions. There are two main stages in this algorithm. In the first stage, it classifies images in a database into several clusters, and automatically gets the necessary parameters for the next stage-k-means iteration. The CMA algorithm is tested on a large database of more than ten thousand images and compare it with k-means algorithm. Experimental results show that this algorithm is effective in both precision and retrieval time.
Summarizing Relational Data Using Semi-Supervised Genetic Algorithm-Based Clustering Techniques
Directory of Open Access Journals (Sweden)
Rayner Alfred
2010-01-01
Full Text Available Problem statement: In solving a classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data transformation becomes a tedious trial-and-error work and the classification result is often not very promising especially when the number of tables and the degree of one-to-many association are large. Approach: We proposed a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables to facilitate the task of solving a classification problem in relational database. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations. It can be used in two ways. One is user-controlled clustering, where the user may control the result of clustering by varying the compactness of the spherical cluster. The other is automatic clustering, where a non-overlap clustering strategy is applied. In this study, we use the latter method to dynamically cluster multiple instances, as a means of aggregating them and illustrate the effectiveness of this method using the semi-supervised genetic algorithm-based clustering technique. Results: It was shown in the experimental results that using the reciprocal of Davies-Bouldin Index for cluster dispersion and the reciprocal of Gini Index for cluster purity, as the fitness function in the Genetic Algorithm (GA, finds solutions with much greater accuracy. The results obtained in this study showed that automatic clustering (seeding, by optimizing the cluster dispersion or cluster purity alone using GA, provides one with good results compared to the traditional k-means clustering. However, the best result can be achieved by optimizing the combination values of both the cluster
Scheme for Implementing Quantum Search Algorithm in a Cluster State Quantum Computer
Institute of Scientific and Technical Information of China (English)
ZHANG Da-Li; WANG Yan-Hui; ZHANG Yong
2008-01-01
Using cluster state and single qubit measurement one can perform the one-way quantum computation. Here we give a detailed scheme for realizing a modified Grover search algorithm using measurements on cluster state. We give the measurement pattern for the duster-state realization of the algorithm and estimated the number of measurement needed for its implementation. It is found that O(23n/2n2) number of single qubit measurements is required for its realization in a cluster-state quantum computer.
Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization
Directory of Open Access Journals (Sweden)
A. K. Santra
2011-09-01
Full Text Available An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases document clustering rate. The performance of the proposed work is measured in terms of cluster objects accuracy, term weight, term frequency and inverse document frequency. Experimental results demonstrate that it achieves very good performance on both feature selection and text document categorization, compared to other classifier methods.
Empirical relations between static and dynamic exponents for Ising model cluster algorithms
Coddington, Paul D.; Baillie, Clive F.
1992-02-01
We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is zint,EW=α/ν. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average maximum cluster size gives either a constant or a logarithm, which implies that zint,ESW=β/ν for the Ising model.
Empirical relations between static and dynamic exponents for Ising model cluster algorithms
Energy Technology Data Exchange (ETDEWEB)
Coddington, P.D. (Department of Physics, Syracuse University, Syracuse, New York 13244 (United States)); Baillie, C.F. (Department of Physics, University of Colorado, Boulder, Colorado 80309 (United States))
1992-02-17
We have measured the autocorrelations for the Swendsen-Wang and the Wolff cluster update algorithms for the Ising model in two, three, and four dimensions. The data for the Wolff algorithm suggest that the autocorrelations are linearly related to the specific heat, in which case the dynamic critical exponent is {ital z}{sub int,}{ital E}{sup W}={alpha}/{nu}. For the Swendsen-Wang algorithm, scaling the autocorrelations by the average maximum cluster size gives either a constant or a logarithm, which implies that {ital z}{sub int,}{ital E}{sup SW}={beta}/{nu} for the Ising model.
AN IMPROVED ALGORITHM FOR SUPERVISED FUZZY C-MEANS CLUSTERING OF REMOTELY SENSED DATA
Institute of Scientific and Technical Information of China (English)
无
2000-01-01
This paper describes an improved algorithm for fuzzy c-means clustering of remotely sensed data, by which the degree of fuzziness of the resultant classification is de creased as comparing with that by a conventional algorithm: that is , the classification accura cy is increased. This is achieved by incorporating covariance matrices at the level of individual classes rather than assuming a global one. Empirical results from a fuzzy classification of an Edinburgh suburban land cover confirmed the improved performance of the new algorithm for fuzzy c-means clustering, in particular when fuzziness is also accommodated in the assumed reference data.
A randomized algorithm for two-cluster partition of a set of vectors
Kel'manov, A. V.; Khandeev, V. I.
2015-02-01
A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.
A Comparison of Algorithms for the Construction of SZ Cluster Catalogues
Melin, J -B; Bartelmann, M; Bartlett, J G; Betoule, M; Bobin, J; Carvalho, P; Chon, G; Delabrouille, J; Diego, J M; Harrison, D L; Herranz, D; Hobson, M; Kneissl, R; Lasenby, A N; Jeune, M Le; Lopez-Caniego, M; Mazzotta, P; Rocha, G M; Schaefer, B M; Starck, J -L; Waizmann, J -C; Yvon, D
2012-01-01
We evaluate the construction methodology of an all-sky catalogue of galaxy clusters detected through the Sunyaev-Zel'dovich (SZ) effect. We perform an extensive comparison of twelve algorithms applied to the same detailed simulations of the millimeter and submillimeter sky based on a Planck-like case. We present the results of this "SZ Challenge" in terms of catalogue completeness, purity, astrometric and photometric reconstruction. Our results provide a comparison of a representative sample of SZ detection algorithms and highlight important issues in their application. In our study case, we show that the exact expected number of clusters remains uncertain (about a thousand cluster candidates at |b|> 20 deg with 90% purity) and that it depends on the SZ model and on the detailed sky simulations, and on algorithmic implementation of the detection methods. We also estimate the astrometric precision of the cluster candidates which is found of the order of ~2 arcmins on average, and the photometric uncertainty of...
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing
Directory of Open Access Journals (Sweden)
Luo Zhong
2014-01-01
Full Text Available With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data.
A reliable cluster detection technique using photometric redshifts: introducing the 2TecX algorithm
van Breukelen, Caroline
2009-01-01
We present a new cluster detection algorithm designed for finding high-redshift clusters using optical/infrared imaging data. The algorithm has two main characteristics. First, it utilises each galaxy's full redshift probability function, instead of an estimate of the photometric redshift based on the peak of the probability function and an associated Gaussian error. Second, it identifies cluster candidates through cross-checking the results of two substantially different selection techniques (the name 2TecX representing the cross-check of the two techniques). These are adaptations of the Voronoi Tesselations and Friends-Of-Friends methods. Monte-Carlo simulations of mock catalogues show that cross-checking the cluster candidates found by the two techniques significantly reduces the detection of spurious sources. Furthermore, we examine the selection effects and relative strengths and weaknesses of either method. The simulations also allow us to fine-tune the algorithm's parameters, and define completeness an...
GPU-based single-cluster algorithm for the simulation of the Ising model
Komura, Yukihiro; Okabe, Yutaka
2012-02-01
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L = 4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L = 256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.
GPU-based single-cluster algorithm for the simulation of the Ising model
Komura, Yukihiro
2011-01-01
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the GPU calculation speed for the two-dimensional Ising model at the critical temperature with the linear size L=4096 is 5.60 times as fast as the calculation speed on a current CPU core. For the three-dimensional Ising model with the linear size L=256, the GPU calculation speed is 7.90 times as fast as the CPU calculation speed. The idea of quasi-block synchronization can be used not only in the cluster algorithm but also in many fields where the synchronization of all threads is required.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network
Directory of Open Access Journals (Sweden)
C. Vimalarani
2016-01-01
Full Text Available Wireless Sensor Network (WSN is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.
Vimalarani, C; Subramanian, R; Sivanandam, S N
2016-01-01
Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.
Evaluation of clustering algorithms for gene expression data using gene ontology annotations
Institute of Scientific and Technical Information of China (English)
MA Ning; ZHANG Zheng-guo
2012-01-01
Background Clustering is a useful exploratory technique for interpreting gene expression data to reveal groups of genes sharing common functional attributes.Biologists frequently face the problem of choosing an appropriate algorithm.We aimed to provide a standalone,easily accessible and biologically oriented criterion for expression data clustering evaluation.Methods An external criterion utilizing annotation based similarities between genes is proposed in this work.Gene ontology information is employed as the annotation source.Comparisons among six widely used clustering algorithms over various types of gene expression data sets were carried out based on the criterion proposed.Results The rank of these algorithms given by the criterion coincides with our common knowledge.Single-linkage has significantly poorer performance,even worse than the random algorithm.Ward's method archives the best performance in most cases.Conclusions The criterion proposed has a strong ability to distinguish among different clustering algorithms with different distance measurements.It is also demonstrated that analyzing main contributors of the criterion may offer some guidelines in finding local compact clusters.As an addition,we suggest using Ward's algorithm for gene expression data analysis.
Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao
2015-01-01
Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383
Directory of Open Access Journals (Sweden)
Nan Lin
Full Text Available Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis.
Directory of Open Access Journals (Sweden)
Ashim Kumar Ghosh
2011-12-01
Full Text Available Wireless sensor nodes are use most embedded computing application. Multihop cluster hierarchy has been presented for large wireless sensor networks (WSNs that can provide scalable routing, data aggregation, and querying. The energy consumption rate for sensors in a WSN varies greatly based on the protocols the sensors use for communications. In this paper we present a cluster based routing algorithm. One of our main goals is to design the energy efficient routing protocol. Here we try to solve the usual problems of WSNs. We know the efficiency of WSNs depend upon the distance between node to base station and the amount of data to be transferred and the performance of clustering is greatly influenced by the selection of cluster-heads, which are in charge of creating clusters and controlling member nodes. This algorithm makes the best use of node with low number of cluster head know as super node. Here we divided the full region in four equal zones and the centre area of the region is used to select for super node. Each zone is considered separately and the zone may be or not divided further that’s depending upon the density of nodes in that zone and capability of the super node. This algorithm forms multilayer communication. The no of layer depends on the network current load and statistics. Our algorithm is easily extended to generate a hierarchy of cluster heads to obtain better network management and energy efficiency.
UNSUPERVISED DATA AND HISTOGRAM CLUSTERING USING INCLINED PLANES SYSTEM OPTIMIZATION ALGORITHM
Directory of Open Access Journals (Sweden)
Mohammad Hamed Mozaffari
2014-03-01
Full Text Available Within the last decades, clustering has gained significant recognition as one of the data mining methods, especially in the relatively new field of medical engineering for diagnosing cancer. Clustering is used as a database to automatically group items with similar characteristics. Researchers aim to introduce a novel and powerful algorithm known as Inclined Planes system Optimization (IPO, with capacity to overcome clustering problems. The proposed method identifies each agent used in the algorithm to indicate the centroids of the clusters and automatically select the number of centroids in each time interval (unsupervised clustering. The evaluation method for clustering is based on the Davies Bouldin index (DBi to show cluster validity. Researchers compare known algorithm on series of data bases from various studies to demonstrate the power and capability of the proposed method. These datasets are popular for pattern recognition with diversity in space dimension. Method performance was tested on standard images as a dataset. Study results show significant method advantage over other algorithms.
Directory of Open Access Journals (Sweden)
Lamiaa F. Ibrahim
2011-01-01
Full Text Available Problem statement: The process of network planning is divided into two sub steps. The first step is determining the location of the Multi Service Access Node (MSAN. The second step is the construction of subscriber network lines from MSAN to subscribers to satisfy optimization criteria and design constraints. Due to the complexity of this process artificial intelligence and clustering techniques have been successfully deployed to solve many problems. The problems of the locations of MSAN, the cabling layout and the computation of optimum cable network layouts have been addressed in this study. The proposed algorithm, Clustering density-Based Spatial of Applications with Noise original, minimal Spanning tree and modified Ant-Colony-Based algorithm (CBSCAN-SPANT, used two clustering algorithms which are density-based and agglomerative clustering algorithm using distances which are shortest paths distance and satisfying the network constraints. This algorithm used wire and wireless technology to serve the subscribers demand and place the switches in a real optimal place. Approach: The density-based Spatial Clustering of Applications with Noise original (DBSCAN algorithm has been modified and a new algorithm (NetPlan algorithm has been proposed by the author in a recent work to solve the first step in the problem of network planning. In the present study, the NetPlan algorithm is modified by introduce the modified Ant-Colony-Based algorithm to find the optimal path between any node and the corresponding MSAN node in the first step of network planning process to determine nodes belonging to each cluster. The second step, in the process of network planning, is also introduced in the present study. For each cluster, the optimal cabling layout from each MSAN to the subscriber premises is determining by introduce the Prime algorithm which construct minimal spanning tree. Results: Experimental results and analysis indicate that the
Multidistribution Center Location Based on Real-Parameter Quantum Evolutionary Clustering Algorithm
Directory of Open Access Journals (Sweden)
Huaixiao Wang
2014-01-01
Full Text Available To determine the multidistribution center location and the distribution scope of the distribution center with high efficiency, the real-parameter quantum-inspired evolutionary clustering algorithm (RQECA is proposed. RQECA is applied to choose multidistribution center location on the basis of the conventional fuzzy C-means clustering algorithm (FCM. The combination of the real-parameter quantum-inspired evolutionary algorithm (RQIEA and FCM can overcome the local search defect of FCM and make the optimization result independent of the choice of initial values. The comparison of FCM, clustering based on simulated annealing genetic algorithm (CSAGA, and RQECA indicates that RQECA has the same good convergence as CSAGA, but the search efficiency of RQECA is better than that of CSAGA. Therefore, RQECA is more efficient to solve the multidistribution center location problem.
New two-dimensional fuzzy C-means clustering algorithm for image segmentation
Institute of Scientific and Technical Information of China (English)
无
2008-01-01
To solve the problem of poor anti-noise performance of the traditional fuzzy C-means (FCM) algorithm in image segmentation,a novel two-dimensional FCM clustering algorithm for image segmentation was proposed.In this method,the image segmentation was converted into an optimization problem.The fitness function containing neighbor information was set up based on the gray information and the neighbor relations between the pixcls described by the improved two-dimensional histogram.By making use of the global searching ability of the predator-prey particle swarm optimization,the optimal cluster center could be obtained by iterative optimization,and the image segmentation could be accomplished.The simulation results show that the segmentation accuracy ratio of the proposed method is above 99%.The proposed algorithm has strong anti-noise capability,high clustering accuracy and good segment effect,indicating that it is an effective algorithm for image segmentation.
Energy Technology Data Exchange (ETDEWEB)
Uy, D.L.
1996-02-01
An algorithm for detection and identification of image clusters or {open_quotes}blobs{close_quotes} based on color information for an autonomous mobile robot is developed. The input image data are first processed using a crisp color fuszzyfier, a binary smoothing filter, and a median filter. The processed image data is then inputed to the image clusters detection and identification program. The program employed the concept of {open_quotes}elastic rectangle{close_quotes}that stretches in such a way that the whole blob is finally enclosed in a rectangle. A C-program is develop to test the algorithm. The algorithm is tested only on image data of 8x8 sizes with different number of blobs in them. The algorithm works very in detecting and identifying image clusters.
An approximation polynomial-time algorithm for a sequence bi-clustering problem
Kel'manov, A. V.; Khamidullin, S. A.
2015-06-01
We consider a strongly NP-hard problem of partitioning a finite sequence of vectors in Euclidean space into two clusters using the criterion of the minimal sum of the squared distances from the elements of the clusters to the centers of the clusters. The center of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The center of the other cluster is fixed at the origin. Moreover, the partition is such that the difference between the indices of two successive vectors in the first cluster is bounded above and below by prescribed constants. A 2-approximation polynomial-time algorithm is proposed for this problem.
AN OPTIMIZED WEIGHT BASED CLUSTERING ALGORITHM IN HETEROGENEOUS WIRELESS SENSOR NETWORKS
Directory of Open Access Journals (Sweden)
Babu.N.V
2012-12-01
Full Text Available The last few years have seen an increased interest in the potential use of wireless sensor networks (WSNs in various fields like disaster management, battle field surveillance, and border security surveillance. In such applications, a large number of sensor nodes are deployed, which are often unattended and work autonomously. The process of dividing the network into interconnected substructures is called clustering and the interconnected substructures are called clusters. The cluster head (CH of each cluster act as a coordinator within the substructure. Each CH acts as a temporary base station within its zone or cluster. It also communicates with other CHs. Clustering is a key technique used to extend the lifetime of a sensor network by reducing energy consumption. It can also increase network scalability. Researchers in all fields of wireless sensor network believe that nodes are homogeneous, but some nodes may be of different characteristics to prolong the lifetime of a WSN and its reliability. We have proposed an algorithm for better cluster head selection based on weights for different parameter that influence on energy consumption which includes distance from base station as a new parameter to reduce number of transmissions and reduce energy consumption by sensor nodes. Finally proposed algorithm compared with the WCA, IWCA algorithm in terms of number of clusters and energy consumption.
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Directory of Open Access Journals (Sweden)
Amineh Amini
2014-01-01
Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
Graph-based clustering and data visualization algorithms
Vathy-Fogarassy, Ágnes
2013-01-01
This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on
An Efficient Clustering Algorithm for k-Anonymisation
Institute of Scientific and Technical Information of China (English)
Grigorios Loukides; Jian-Hua Shao
2008-01-01
K-anonymisation is an approach to protecting individuals from being identified from data. Good k-anonymisations should retain data utility and preserve privacy, but few methods have considered these two conflicting requirements together. In this paper, we extend our previous work on a clustering-based method for balancing data utility and privacy protection, and propose a set of heuristics to improve its effectiveness. We introduce new clustering criteria that treat utility and privacy on equal terms and propose sampling-based techniques to optimally set up its parameters. Extensive experiments show that the extended method achieves good accuracy in query answering and is able to prevent linking attacks effectively.
DYNAMIC REQUEST DISPATCHING ALGORITHM FOR WEB SERVER CLUSTER
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
The overall increase in traffic on the WWWcauses a disproportionate increase in client requeststo popular web sites.Site administrators constantlyface the requirement to i mprove server's capacity.Web server cluster is a popular solution.It usesgroup of independent servers that are managed as asingle systemfor higher availability,easier manage-ability and greater scalability.Many web sites haveadopted this solution.Request dispatching[1-2]is one of the core tech-nologies used by parallel web server clusters...
Foam Multi-Dimensional General Purpose Monte Carlo Generator With Self-Adapting Symplectic Grid
Jadach, Stanislaw
2000-01-01
A new general purpose Monte Carlo event generator with self-adapting grid consisting of simplices is described. In the process of initialization, the simplex-shaped cells divide into daughter subcells in such a way that: (a) cell density is biggest in areas where integrand is peaked, (b) cells elongate themselves along hyperspaces where integrand is enhanced/singular. The grid is anisotropic, i.e. memory of the axes directions of the primary reference frame is lost. In particular, the algorithm is capable of dealing with distributions featuring strong correlation among variables (like ridge along diagonal). The presented algorithm is complementary to others known and commonly used in the Monte Carlo event generators. It is, in principle, more effective then any other one for distributions with very complicated patterns of singularities - the price to pay is that it is memory-hungry. It is therefore aimed at a small number of integration dimensions (<10). It should be combined with other methods for higher ...
Clustering of Customers Based on Shopping Behavior and Employing Genetic Algorithms
Directory of Open Access Journals (Sweden)
E. P. Bafghi
2017-02-01
Full Text Available Clustering of customers is a vital case in marketing and customer relationship management. In traditional marketing, a market seller is categorized based on general characteristics like clients’ statistical information and their lifestyle features. However, this method seems unable to cope with today’s challenges. In this paper, we present a method for the classification of customers based on variables such as shopping cases and financial information related to the customers’ interactions. One measure of similarity was defined as clustering and clustering quality function was further defined. Genetic algorithms been used to ensure the accuracy of clustering.
A community detection algorithm based on topology potential and spectral clustering.
Wang, Zhixiao; Chen, Zhaotong; Zhao, Ya; Chen, Shaoda
2014-01-01
Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes' topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods.
A Community Detection Algorithm Based on Topology Potential and Spectral Clustering
Directory of Open Access Journals (Sweden)
Zhixiao Wang
2014-01-01
Full Text Available Community detection is of great value for complex networks in understanding their inherent law and predicting their behavior. Spectral clustering algorithms have been successfully applied in community detection. This kind of methods has two inadequacies: one is that the input matrixes they used cannot provide sufficient structural information for community detection and the other is that they cannot necessarily derive the proper community number from the ladder distribution of eigenvector elements. In order to solve these problems, this paper puts forward a novel community detection algorithm based on topology potential and spectral clustering. The new algorithm constructs the normalized Laplacian matrix with nodes’ topology potential, which contains rich structural information of the network. In addition, the new algorithm can automatically get the optimal community number from the local maximum potential nodes. Experiments results showed that the new algorithm gave excellent performance on artificial networks and real world networks and outperforms other community detection methods.
Multi-Parameter Signal Sorting Algorithm Based on Dynamic Distance Clustering
Institute of Scientific and Technical Information of China (English)
Ai-Ling He; De-Guo Zeng; Jun Wang; Bin Tang
2009-01-01
A multi-parameter signal sorting algo- rithm for interleaved radar pulses in dense emitter environment is presented. The algorithm includes two parts, pulse classification and pulse repetition interval (PRI) analysis. Firstly, we propose the dynamic distance clustering (DDC) for classification. In the clustering algorithm, the multi-dimension features of radar pulse are used for reliable classification. The similarity threshold estimation method in DDC is derived, which contributes to the efficiency of the algorithm. However, DDC has large computation with many signal pulses. Then, in order to sort radar signals in real time, the improved DDC (IDDC) algorithm is proposed. Finally, PRI analysis is adopted to complete the process of sorting. The simulation experiments and hardware implementations show both algorithms are effective.
A Comprehensive Toolset for General-Purpose Private Computing and Outsourcing
2016-12-08
AFRL-AFOSR-VA-TR-2016-0368 A COMPREHENSIVE TOOLSET FOR GENERAL-PURPOSE PRIVATE COMPUTING AND OUTSOURCING Marina Blanton UNIVERSITY OF NOTRE DAME DU...2013 to 31 Aug 2016 4. TITLE AND SUBTITLE A COMPREHENSIVE TOOLSET FOR GENERAL-PURPOSE PRIVATE COMPUTING AND OUTSOURCING 5a. CONTRACT NUMBER 5b...necessary tools and techniques for supporting general-purpose secure computation and outsourcing . The three main thrusts of the project are: (i
Directory of Open Access Journals (Sweden)
S.Praveena
2015-06-01
Full Text Available This paper presents a hybrid clustering algorithm and feed-forward neural network classifier for land-cover mapping of trees, shade, building and road. It starts with the single step preprocessing procedure to make the image suitable for segmentation. The pre-processed image is segmented using the hybrid genetic-Artificial Bee Colony(ABC algorithm that is developed by hybridizing the ABC and FCM to obtain the effective segmentation in satellite image and classified using neural network . The performance of the proposed hybrid algorithm is compared with the algorithms like, k-means, Fuzzy C means(FCM, Moving K-means, Artificial Bee Colony(ABC algorithm, ABC-GA algorithm, Moving KFCM and KFCM algorithm.
CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.
Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B
2011-08-15
Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.
Institute of Scientific and Technical Information of China (English)
TANG Cheng-long; WANG Shi-gang; LIANG Qin-hua; XU Wei
2009-01-01
Transversal distribution of the steel strip thickness in the entry section of the cold rolling mill seriously affects to the flatness and transversal thickness precision of the final products. Pattern clustering method is introduced into the steel rolling field and used in the patterns recognition of transversal distribution of the steel strip thickness. The well-known k-means clustering algorithm has the advantage of being easily completed, but still has some drawbacks. An improved k-means clustering algorithm is presented, and the main improvements include: (1) the initial clustering points are preselected according to the density queue of data objects; and (2) Mahatanobis distance is applied instead of Euclidean distance in the actual application. Compared to the patterns obtained from the common k-means algorithm, the patterns identified by the improved algorithm show that the improved clustering algorithm is well suitable for the patterns' recognition of transversal distribution of steel strip thickness and it will be useful in on-line quality control system.
Directory of Open Access Journals (Sweden)
J Anuradha
2014-05-01
Full Text Available Attention Deficit Hyperactive Disorder (ADHD is a disruptive neurobehavioral disorder characterized by abnormal behavioral patterns in attention, perusing activity, acting impulsively and combined types. It is predominant among school going children and it is tricky to differentiate between an active and an ADHD child. Misdiagnosis and undiagnosed cases are very common. Behavior patterns are identified by the mentors in the academic environment who lack skills in screening those kids. Hence an unsupervised learning algorithm can cluster the behavioral patterns of children at school for diagnosis of ADHD. In this paper, we propose a hierarchical clustering algorithm to partition the dataset based on attribute dependency (HCAD. HCAD forms clusters of data based on the high dependent attributes and their equivalence relation. It is capable of handling large volumes of data with reasonably faster clustering than most of the existing algorithms. It can work on both labeled and unlabelled data sets. Experimental results reveal that this algorithm has higher accuracy in comparison to other algorithms. HCAD achieves 97% of cluster purity in diagnosing ADHD. Empirical analysis of application of HCAD on different data sets from UCI repository is provided.
A New-Fangled FES-k-Means Clustering Algorithm for Disease Discovery and Visual Analytics
Directory of Open Access Journals (Sweden)
Tonny J. Oyana
2010-01-01
Full Text Available The central purpose of this study is to further evaluate the quality of the performance of a new algorithm. The study provides additional evidence on this algorithm that was designed to increase the overall efficiency of the original k-means clustering technique—the Fast, Efficient, and Scalable k-means algorithm (FES-k-means. The FES-k-means algorithm uses a hybrid approach that comprises the k-d tree data structure that enhances the nearest neighbor query, the original k-means algorithm, and an adaptation rate proposed by Mashor. This algorithm was tested using two real datasets and one synthetic dataset. It was employed twice on all three datasets: once on data trained by the innovative MIL-SOM method and then on the actual untrained data in order to evaluate its competence. This two-step approach of data training prior to clustering provides a solid foundation for knowledge discovery and data mining, otherwise unclaimed by clustering methods alone. The benefits of this method are that it produces clusters similar to the original k-means method at a much faster rate as shown by runtime comparison data; and it provides efficient analysis of large geospatial data with implications for disease mechanism discovery. From a disease mechanism discovery perspective, it is hypothesized that the linear-like pattern of elevated blood lead levels discovered in the city of Chicago may be spatially linked to the city's water service lines.
Hybrid Swarm Intelligence Energy Efficient Clustered Routing Algorithm for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Rajeev Kumar
2016-01-01
Full Text Available Currently, wireless sensor networks (WSNs are used in many applications, namely, environment monitoring, disaster management, industrial automation, and medical electronics. Sensor nodes carry many limitations like low battery life, small memory space, and limited computing capability. To create a wireless sensor network more energy efficient, swarm intelligence technique has been applied to resolve many optimization issues in WSNs. In many existing clustering techniques an artificial bee colony (ABC algorithm is utilized to collect information from the field periodically. Nevertheless, in the event based applications, an ant colony optimization (ACO is a good solution to enhance the network lifespan. In this paper, we combine both algorithms (i.e., ABC and ACO and propose a new hybrid ABCACO algorithm to solve a Nondeterministic Polynomial (NP hard and finite problem of WSNs. ABCACO algorithm is divided into three main parts: (i selection of optimal number of subregions and further subregion parts, (ii cluster head selection using ABC algorithm, and (iii efficient data transmission using ACO algorithm. We use a hierarchical clustering technique for data transmission; the data is transmitted from member nodes to the subcluster heads and then from subcluster heads to the elected cluster heads based on some threshold value. Cluster heads use an ACO algorithm to discover the best route for data transmission to the base station (BS. The proposed approach is very useful in designing the framework for forest fire detection and monitoring. The simulation results show that the ABCACO algorithm enhances the stability period by 60% and also improves the goodput by 31% against LEACH and WSNCABC, respectively.
Discriminative variable selection for clustering with the sparse Fisher-EM algorithm
Bouveyron, Charles
2012-01-01
The interest in variable selection for clustering has increased recently due to the growing need in clustering high-dimensional data. Variable selection allows in particular to ease both the clustering and the interpretation of the results. Existing approaches have demonstrated the efficiency of variable selection for clustering but turn out to be either very time consuming or not sparse enough in high-dimensional spaces. This work proposes to perform a selection of the discriminative variables by introducing sparsity in the loading matrix of the Fisher-EM algorithm. This clustering method has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a latent mixture model which fits the data into a low-dimensional discriminative subspace. Three different approaches are proposed in this work to introduce sparsity in the orientation matrix of the discriminative subspace through $\\ell_{1}$-type penalizations. Experimental comparisons with existing approach...
Institute of Scientific and Technical Information of China (English)
无
2007-01-01
Let G = (V, E) be a complete undirected graph with vertex set V, edge set E, and edge weights I(e)satisfying the triangle inequality. The vertex set V is partitioned into clusters V1, V2 Vk. The clustered traveling salesman problem (CTSP) seeks to compute the shortest Hamiltonian tour that visits all the vertices, in which the vertices of each cluster are visited consecutively. A two-level genetic algorithm (TLGA) was developed for the problem, which favors neither intra-cluster paths nor inter-cluster paths, thus realized integrated evolutionary optimization for both levels of the CTSP. Results show that the algorithm is more effective than known algorithms. A large-scale traveling salesman problem (TSP) can be converted into a CTSP by clustering so that it can then be solved by the algorithm. Test results demonstrate that the clustering TLGA for large TSPs is more effective and efficient than the classical genetic algorithm.
Zainuddin, Zarita; Lai, Kee Huong; Ong, Pauline
2013-04-01
Artificial neural networks (ANNs) are powerful mathematical models that are used to solve complex real world problems. Wavelet neural networks (WNNs), which were developed based on the wavelet theory, are a variant of ANNs. During the training phase of WNNs, several parameters need to be initialized; including the type of wavelet activation functions, translation vectors, and dilation parameter. The conventional k-means and fuzzy c-means clustering algorithms have been used to select the translation vectors. However, the solution vectors might get trapped at local minima. In this regard, the evolutionary harmony search algorithm, which is capable of searching for near-optimum solution vectors, both locally and globally, is introduced to circumvent this problem. In this paper, the conventional k-means and fuzzy c-means clustering algorithms were hybridized with the metaheuristic harmony search algorithm. In addition to obtaining the estimation of the global minima accurately, these hybridized algorithms also offer more than one solution to a particular problem, since many possible solution vectors can be generated and stored in the harmony memory. To validate the robustness of the proposed WNNs, the real world problem of epileptic seizure detection was presented. The overall classification accuracy from the simulation showed that the hybridized metaheuristic algorithms outperformed the standard k-means and fuzzy c-means clustering algorithms.
An efficient clustering algorithm for partitioning Y-short tandem repeats data
Directory of Open Access Journals (Sweden)
Seman Ali
2012-10-01
Full Text Available Abstract Background Y-Short Tandem Repeats (Y-STR data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH, obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91, k-Modes-RVF (0.81, New Fuzzy k-Modes (0.80, k-Modes (0.76, k-Modes-Hybrid 1 (0.76, k-Modes-Hybrid 2 (0.75, Fuzzy k-Modes (0.74, and k-Modes-UAVM (0.70. Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k and considered to be linear.
A REAL-TIME C-V CLUSTERING ALGORITHM FOR WEB-MINING
Institute of Scientific and Technical Information of China (English)
Li Haiying; Zhuang Zhenquan; Li Bin; Wan Ke
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site. The algorithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value [0,1]input and self-definition vigilance parameter to design clustering-architecture. Vector Degree of Matching (VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic. Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99. This non-linear relation between vigilance parameter and classification upper limit helps mining out representative classifications from net-users according to the actual web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and rapidly.
Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm
Directory of Open Access Journals (Sweden)
Xia Hanyu
2016-01-01
Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.
A Throughput-Driven Scheduling Algorithm of Differentiated Service for Web Cluster
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Requests distribution is an key technology for Web cluster server. This paper presents a throughput-driven scheduling algorithm (TDSA). The algorithm adopts the throughput of cluster back-ends to evaluate their load and employs the neural network model to predict the future load so that the scheduling system features a self-learning capability and good adaptability to the change of load. Moreover, it separates static requests from dynamic requests to make full use of the CPU resources and takes the locality of requests into account to improve the cache hit ratio. Experimental results from the testing tool of WebBenchTM show better performance for Web cluster server with TDSA than that with traditional scheduling algorithms.
Ternary Tree and Clustering Based Huffman Coding Algorithm
Directory of Open Access Journals (Sweden)
Pushpa R. Suri
2010-09-01
Full Text Available In this study, the focus was on the use of ternary tree over binary tree. Here, a new two pass Algorithm for encoding Huffman ternary tree codes was implemented. In this algorithm we tried to find out the codeword length of the symbol. Here I used the concept of Huffman encoding. Huffman encoding was a two pass problem. Here the first pass was to collect the letter frequencies. You need to use that information to create the Huffman tree. Note that char values range from -128 to 127, so you will need to cast them. I stored the data as unsigned chars to solve this problem, and then the range is 0 to 255. Open the output file and write the frequency table to it. Open the input file, read characters from it, gets the codes, and writes the encoding into the output file. Once a Huffman code has been generated, data may be encoded simply by replacing each symbol with its code. To reduce the memory size and fasten the process of finding the codeword length for a symbol in a Huffman tree, we proposed a memory efficient data structure to represent the codeword length of Huffman ternary tree. In this algorithm we tried to find out the length of the code of the symbols used in the tree.
National Research Council Canada - National Science Library
T. Velmurugan; T. Santhanam
2010-01-01
.... Clustering algorithms can be applied in many domains. Approach: In this research, the most representative algorithms K-Means and K-Medoids were examined and analyzed based on their basic approach...
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.
Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue
2016-02-19
Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network
Directory of Open Access Journals (Sweden)
Yuzhong Chen
2016-02-01
Full Text Available Vehicular ad hoc networks (VANETs have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.
Directory of Open Access Journals (Sweden)
Hanane FROUD
2013-11-01
Full Text Available Document Clustering algorithms goal is to create clusters that are coherent internally, but clearly different from each other. The useful expressions in the documents is often accompanied by a large amount of noise that is caused by the use of unnecessary words, so it is indispensable to eliminate it and keeping just the useful information. Keyphrases extraction systems in Arabic are new phenomena. A number of Text Mining applications can use it to improve her results. The Keyphrases are defined as phrases that capture the main topics discussed in document; they offer a brief and precise summary of document content. Therefore, it can be a good solution to get rid of the existent noise from documents. In this paper, we propose a new method to solve the problem cited above especially for Arabic language documents, which is one of the most complex languages, by using a new Keyphrases extraction algorithm based on the Suffix Tree data structure (KpST. To evaluate our approach, we conduct an experimental study on Arabic Documents Clustering using the most popular approach of Hierarchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a variety of distance functions and similarity measures to perform Arabic Document Clustering task. The obtained results show that our approach for extracting Keyphrases improves the clustering results.
KohonAnts: A Self-Organizing Ant Algorithm for Clustering and Pattern Classification
Fernandes, C; Merelo, J J; Ramos, V; Laredo, J L J
2008-01-01
In this paper we introduce a new ant-based method that takes advantage of the cooperative self-organization of Ant Colony Systems to create a naturally inspired clustering and pattern recognition method. The approach considers each data item as an ant, which moves inside a grid changing the cells it goes through, in a fashion similar to Kohonen's Self-Organizing Maps. The resulting algorithm is conceptually more simple, takes less free parameters than other ant-based clustering algorithms, and, after some parameter tuning, yields very good results on some benchmark problems.
Directory of Open Access Journals (Sweden)
Burhan Ergen
2014-01-01
Full Text Available This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT and Magnetic Resonance Imaging (MRI devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.
Algorithm for Multi-laser-target Tracking Based on Clustering Fusion
Institute of Scientific and Technical Information of China (English)
ZHANG Li-qun; LI Yan-jun; ZHANG Ke
2007-01-01
Multi-laser-target tracking is an important subject in the field of signal processing of laser warners. A clustering method is applied to the measurement of laser warner, and the space-time fusion for measurements in the same cluster is accomplished. Real-time tracking of multi-laser-target and real-time picking of multi-laser-signal are introduced using data fusion of the measurements. A prototype device of the algorithm is built up. The results of experiments show that the algorithm is very effective.
Ergen, Burhan
2014-01-01
This paper proposes two edge detection methods for medical images by integrating the advantages of Gabor wavelet transform (GWT) and unsupervised clustering algorithms. The GWT is used to enhance the edge information in an image while suppressing noise. Following this, the k-means and Fuzzy c-means (FCM) clustering algorithms are used to convert a gray level image into a binary image. The proposed methods are tested using medical images obtained through Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) devices, and a phantom image. The results prove that the proposed methods are successful for edge detection, even in noisy cases.
Segmentation of Mushroom and Cap width Measurement using Modified K-Means Clustering Algorithm
Directory of Open Access Journals (Sweden)
Eser Sert
2014-01-01
Full Text Available Mushroom is one of the commonly consumed foods. Image processing is one of the effective way for examination of visual features and detecting the size of a mushroom. We developed software for segmentation of a mushroom in a picture and also to measure the cap width of the mushroom. K-Means clustering method is used for the process. K-Means is one of the most successful clustering methods. In our study we customized the algorithm to get the best result and tested the algorithm. In the system, at first mushroom picture is filtered, histograms are balanced and after that segmentation is performed. Results provided that customized algorithm performed better segmentation than classical K-Means algorithm. Tests performed on the designed software showed that segmentation on complex background pictures is performed with high accuracy, and 20 mushrooms caps are measured with 2.281 % relative error.
An efficient method of key-frame extraction based on a cluster algorithm.
Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng
2013-12-18
This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.
Robustness of "cut and splice" genetic algorithms in the structural optimization of atomic clusters
Froltsov, V.; Reuter, K.
2009-01-01
We return to the geometry optimization problem of Lennard-Jones clusters to analyze the performance dependence of 'cut and splice' genetic algorithms (GAs) on the employed population size. We generally find that admixing twinning mutation moves leads to an improved robustness of the algorithm efficiency with respect to this a priori unknown technical parameter. The resulting very stable performance of the corresponding mutation + mating GA implementation over a wide range of population sizes...
Critical slowing down of cluster algorithms for Ising models coupled to 2-d gravity
Bowick, Mark; Falcioni, Marco; Harris, Geoffrey; Marinari, Enzo
1994-02-01
We simulate single and multiple Ising models coupled to 2-d gravity using both the Swendsen-Wang and Wolff algorithms to update the spins. We study the integrated autocorrelation time and find that there is considerable critical slowing down, particularly in the magnetization. We argue that this is primarily due to the local nature of the dynamical triangulation algorithm and to the generation of a distribution of baby universes which inhibits cluster growth.
Critical Slowing Down of Cluster Algorithms for Ising Models Coupled to 2-d Gravity
Bowick, M; Harris, G; Marinari, E
1994-01-01
We simulate single and multiple Ising models coupled to 2-d gravity using both the Swendsen-Wang and Wolff algorithms to update the spins. We study the integrated autocorrelation time and find that there is considerable critical slowing down, particularly in the magnetization. We argue that this is primarily due to the local nature of the dynamical triangulation algorithm and to the generation of a distribution of baby universes which inhibits cluster growth.
Computing OpenSURF on OpenCL and General Purpose GPU
Directory of Open Access Journals (Sweden)
Wanglong Yan
2013-10-01
Full Text Available Speeded-Up Robust Feature (SURF algorithm is widely used for image feature detecting and matching in computer vision area. Open Computing Language (OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. This paper introduces how to implement an open-sourced SURF program, namely OpenSURF, on general purpose GPU by OpenCL, and discusses the optimizations in terms of the thread architectures and memory models in detail. Our final OpenCL implementation of OpenSURF is on average 37% and 64% faster than the OpenCV SURF v2.4.5 CUDA implementation on NVidia's GTX660 and GTX460SE GPUs, repectively. Our OpenCL program achieved real-time performance (>25 Frames Per Second for almost all the input images with different sizes from 320*240 to 1024*768 on NVidia's GTX660 GPU, NVidia's GTX460SE GPU and AMD's Radeon HD 6850 GPU. Our OpenCL approach on NVidia's GTX660 GPU is more than 22.8 times faster than its original CPU version on Intel's Dual-Core E5400 2.7G on average.
Computing OpenSURF on OpenCL and General Purpose GPU
Directory of Open Access Journals (Sweden)
Wanglong Yan
2013-10-01
Full Text Available Speeded-Up Robust Feature (SURF algorithm is widely used for image feature detecting and matching in computer vision area. Open Computing Language (OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, and other processors. This paper introduces how to implement an open-sourced SURF program, namely OpenSURF, on general purpose GPU by OpenCL, and discusses the optimizations in terms of the thread architectures and memory models in detail. Our final OpenCL implementation of OpenSURF is on average 37% and 64% faster than the OpenCV SURF v2.4.5 CUDA implementation on NVidia’s GTX660 and GTX460SE GPUs, repectively. Our OpenCL program achieved real-time performance (>25 Frames Per Second for almost all the input images with different sizes from 320*240 to 1024*768 on NVidia’s GTX660 GPU, NVidia’s GTX460SE GPU and AMD’s Radeon HD 6850 GPU. Our OpenCL approach on NVidia’s GTX660 GPU is more than 22.8 times faster than its original CPU version on Intel’s Dual-Core E5400 2.7G on average.
Strong scaling of general-purpose molecular dynamics simulations on GPUs
Glaser, Jens; Anderson, Joshua A; Lui, Pak; Spiga, Filippo; Millan, Jaime A; Morse, David C; Glotzer, Sharon C
2014-01-01
We describe a highly optimized implementation of MPI domain decomposition in a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented within a code that was designed for execution on GPUs from the start (Anderson et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair force and bond force fields and achieves optimal GPU performance using an autotuning algorithm. We are able to demonstrate equivalent or superior scaling on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD) simulations of up to 108 million particles. GPUDirect RDMA capabilities in recent GPU generations provide better performance in full double precision calculations. For a representative polymer physics application, HOOMD-blue 1.0 provides an effective GPU vs. CPU node speed-up of 12.5x.
Directory of Open Access Journals (Sweden)
Lejiang Guo
2011-05-01
Full Text Available Wireless Sensor Networks (WSN represent a new dimension in the field of network research. The cluster algorithm can significantly reduce the energy consumption of wireless sensor networks and prolong the network lifetime. This paper uses neuron to describe the WSN node and constructs neural network model for WSN. The neural network model includes three aspects: WSN node neuron model, WSN node control model and WSN node connection model. Through learning the framework of cluster algorithm for wireless sensor networks, this paper presents a weighted average of cluster-head selection algorithm based on an improved Genetic Optimization which makes the node weights directly related to the decision-making predictions. The Algorithm consists of two stages: single-parent evolution and population evolution. The initial population is formed in the stage of single-parent evolution by using gene pool, then the algorithm continues to the next further evolution process, finally the best solution will be generated and saved in the population. The simulation results illustrate that the new algorithm has the high convergence speed and good global searching capacity. It is to effectively balance the network energy consumption, improve the network life-cycle, ensure the communication quality and provide a certain theoretical foundation for the applications of the neural networks.
A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
Institute of Scientific and Technical Information of China (English)
Haoxiang XIA; Shuguang WANG; Taketoshi YOSHIDA
2006-01-01
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance.Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant's carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
A Hybrid LBFGS-DE Algorithm for Global Optimization of the Lennard-Jones Cluster Problem
Directory of Open Access Journals (Sweden)
Ernesto Padernal Adorio
2004-12-01
Full Text Available The Lennard-Jones cluster conformation problem is to determine a configuration of n atoms in three-dimensional space where the sum of the nonlinear pairwise potential function is at a minimum. In this formula, ri,j is the distance between atoms i and j. This optimization problem is a severe test for global optimization algorithms due to its computational complexity: the number of local minima grows exponentially large as the number of atoms in the cluster is increased. As a specific test case, a better cluster configuration than the previously published putative minimum for the 38-atom case was found in the mid-1990s.
A harmony search algorithm for clustering with feature selection
Directory of Open Access Journals (Sweden)
Carlos Cobos
2010-01-01
Full Text Available En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.
Gaur, Pallavi; Chaturvedi, Anoop
2017-07-22
The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.
Structures of Adatom Clusters on Ag(111) Surface by Genetic Algorithm
Institute of Scientific and Technical Information of China (English)
SUN Zhi-Hua; LIU Qing-Wei; LI Yu-Fen; ZHUANG Jun
2004-01-01
@@ We study the structures of Ag adatom clusters supported on the metal Ag(111) surface using the genetic algorithm (GA). The atomic interactions are modelled by the surface-embedded-atom method. The lowest-energy structures of adatom clusters with sizes n = 3-20 are obtained, in which n = 7, 10, 12, 14, 16, 19 are the magic numbers.Furthermore, we give a series of structures with energies close to the lowest energy (the lower-energy isomers), and the structure features are studied in detail. Except for some magic clusters and small clusters, every configuration of adatom clusters generally has two distinct adsorption ways, so the isomers always appear in pairs.
Distributed Clustering Algorithm to Explore Selection Diversity in Wireless Sensor Networks
Kong, Hyung-Yun; Asaduzzaman, Hyung-Yun
This paper presents a novel cross-layer approach to explore selection diversity for distributed clustering based wireless sensor networks (WSNs) by selecting a proper cluster-head. We develop and analyze an instantaneous channel state information (CSI) based cluster-head selection algorithm for a distributed, dynamic and randomized clustering based WSN. The proposed cluster-head selection scheme is also random and capable to distribute the energy uses among the nodes in the network. We present an analytical approach to evaluate the energy efficiency and system lifetime of our proposal. Analysis shows that the proposed scheme outperforms the performance of additive white Gaussian noise (AWGN) channel under Rayleigh fading environment. This proposal also outperforms the existing cooperative diversity protocols in terms of system lifetime and implementation complexity.
An improved K-means clustering algorithm in agricultural image segmentation
Cheng, Huifeng; Peng, Hui; Liu, Shanmei
Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.
Directory of Open Access Journals (Sweden)
MOSTAFA BAGHOURI
2014-06-01
Full Text Available Ameliorating the lifetime in heterogeneous wireless sensor network is an important task because the sensor nodes are limited in the resource energy. The best way to improve a WSN lifetime is the clustering based algorithms in which each cluster is managed by a leader called Cluster Head. Each other node must communicate with this CH to send the data sensing. The nearest base station nodes must also send their data to their leaders, this causes a loss of energy. In this paper, we propose a new approach to ameliorate a threshold distributed energy efficient clustering protocol for heterogeneous wireless sensor networks by excluding closest nodes to the base station in the clustering process. We show by simulation in MATLAB that the proposed approach increases obviously the number of the received packet messages and prolongs the lifetime of the network compared to TDEEC protocol.
An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information
Directory of Open Access Journals (Sweden)
Ao Li
2009-04-01
Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.
Chaos control of ferroresonance system based on RBF-maximum entropy clustering algorithm
Energy Technology Data Exchange (ETDEWEB)
Liu Fan [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China)]. E-mail: liufan2003@yahoo.com.cn; Sun Caixin [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Sima Wenxia [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Liao Ruijin [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China); Guo Fei [Key Lab of High Voltage and Electrical New Technology of Ministry of Education, Chongqing University, Chongqing 400044 (China)
2006-09-11
With regards to the ferroresonance overvoltage of neutral grounded power system, a maximum-entropy learning algorithm based on radial basis function neural networks is used to control the chaotic system. The algorithm optimizes the object function to derive learning rule of central vectors, and uses the clustering function of network hidden layers. It improves the regression and learning ability of neural networks. The numerical experiment of ferroresonance system testifies the effectiveness and feasibility of using the algorithm to control chaos in neutral grounded system.
Image Segmentation Algorithm Based on Spectral Clustering Algorithm%谱聚类图像分割算法研究
Institute of Scientific and Technical Information of China (English)
张权; 胡玉兰
2012-01-01
针对谱聚类算法对图像分割效果差强人意的特点,研究了一种改进的Nystr(o)m算法进行谱聚类图像分割,使谱聚类算法应用于图像分割的效果有所改善.该算法首先对图像进行预处理,变换图像的分布数据空间,再分别计算对选定样本空间的数据间以及样本与其他空间的数据间的距离矩阵,并转化为相似矩阵；然后对相似矩阵正交化并且特征分解,进行K-Means聚类；最后将聚类结果进行后期处理.通过实验验证了该算法的有效性.%Spectral clustering algorithm to image segmentation was not perfect. An algorithm is proposed for spectral clustering image segmentation, which makes the effect of image segmentation better. Firstly, the image was pre-processed, transformed the distribution of the image data space, and calculated the distance matrix between the data of the selected sample space as well as samples and other space. It is transformed into a similarity matrix,what is more,the similarity matrix is made by orthogonal . The characteristics is decomposing by K-Means clustering; Finally, it took some steps for clustering results to be processed . Effectiveness of the algorithm is verified by experiment reasults.
Lee, Chongdeuk; Jeong, Taegwon
2011-01-01
Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.
Cluster algorithm for two-dimensional U(1) lattice gauge theory
Sinclair, R.
1992-03-01
We use gauge fixing to rewrite the two-dimensional U(1) pure gauge model with Wilson action and periodic boundary conditions as a nonfrustrated XY model on a closed chain. The Wolff single-cluster algorithm is then applied, eliminating critical slowing down of topological modes and Polyakov loops.
The Geometric Cluster Algorithm: Rejection-Free Monte Carlo Simulation of Complex Fluids
Luijten, Erik
2005-03-01
The study of complex fluids is an area of intense research activity, in which exciting and counter-intuitive behavior continue to be uncovered. Ironically, one of the very factors responsible for such interesting properties, namely the presence of multiple relevant time and length scales, often greatly complicates accurate theoretical calculations and computer simulations that could explain the observations. We have recently developed a new Monte Carlo simulation methodootnotetextJ. Liu and E. Luijten, Phys. Rev. Lett.92, 035504 (2004); see also Physics Today, March 2004, pp. 25--27. that overcomes this problem for several classes of complex fluids. Our approach can accelerate simulations by orders of magnitude by introducing nonlocal, collective moves of the constituents. Strikingly, these cluster Monte Carlo moves are proposed in such a manner that the algorithm is rejection-free. The identification of the clusters is based upon geometric symmetries and can be considered as the off-latice generalization of the widely-used Swendsen--Wang and Wolff algorithms for lattice spin models. While phrased originally for complex fluids that are governed by the Boltzmann distribution, the geometric cluster algorithm can be used to efficiently sample configurations from an arbitrary underlying distribution function and may thus be applied in a variety of other areas. In addition, I will briefly discuss various extensions of the original algorithm, including methods to influence the size of the clusters that are generated and ways to introduce density fluctuations.
New and Old Jet Clustering Algorithms for Electron-Positron Events
Moretti, S; Sjöstrand, Torbjörn; Moretti, Stefano; Lönnblad, Leif; Sjöstrand, Torbjörn
1998-01-01
Over the years, many jet clustering algorithms have been proposed for the analysis of hadronic final states in $e^+e^-$ annihilations. These have somewhat different emphasis and are therefore more or less suited for various applications. We here review some of the most used and compare them from a theoretical and experimental point of view.
A genetic algorithm using hyper-quadtrees for low-dimensional K-means clustering.
Laszlo, Michael; Mukherjee, Sumitra
2006-04-01
The k-means algorithm is widely used for clustering because of its computational efficiency. Given n points in d-dimensional space and the number of desired clusters k, k-means seeks a set of k cluster centers so as to minimize the sum of the squared Euclidean distance between each point and its nearest cluster center. However, the algorithm is very sensitive to the initial selection of centers and is likely to converge to partitions that are significantly inferior to the global optimum. We present a genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k. The set of centers is represented using a hyper-quadtree constructed on the data. This representation is exploited in our GA to generate an initial population of good centers and to support a novel crossover operation that selectively passes good subsets of neighboring centers from parents to offspring by swapping subtrees. Experimental results indicate that our GA finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
An effective trust-based recommendation method using a novel graph clustering algorithm
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
A cluster finding algorithm based on the multi-band identification of red-sequence galaxies
Oguri, Masamune
2014-01-01
We present a new algorithm, CAMIRA, to identify clusters of galaxies in wide-field imaging survey data. We base our algorithm on the stellar population synthesis model to predict colours of red-sequence galaxies at a given redshift for an arbitrary set of bandpass filters, with additional calibration using a sample of spectroscopic galaxies to improve the accuracy of the model prediction. We run the algorithm on ~11960 deg^2 of imaging data from the Sloan Digital Sky Survey (SDSS) Data Release 8 to construct a catalogue of 71743 clusters in the redshift range 0.1
Gkaitatzis, Stamatios; The ATLAS collaboration
2016-01-01
In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100 µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avai...
Gkaitatzis, Stamatios; The ATLAS collaboration; Annovi, Alberto; Kordas, Kostantinos
2016-01-01
In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avail...
BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster
Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi
2007-12-01
This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.
Performance evaluation of simple linear iterative clustering algorithm on medical image processing.
Cong, Jinyu; Wei, Benzheng; Yin, Yilong; Xi, Xiaoming; Zheng, Yuanjie
2014-01-01
Simple Linear Iterative Clustering (SLIC) algorithm is increasingly applied to different kinds of image processing because of its excellent perceptually meaningful characteristics. In order to better meet the needs of medical image processing and provide technical reference for SLIC on the application of medical image segmentation, two indicators of boundary accuracy and superpixel uniformity are introduced with other indicators to systematically analyze the performance of SLIC algorithm, compared with Normalized cuts and Turbopixels algorithm. The extensive experimental results show that SLIC is faster and less sensitive to the image type and the setting superpixel number than other similar algorithms such as Turbopixels and Normalized cuts algorithms. And it also has a great benefit to the boundary recall, the robustness of fuzzy boundary, the setting superpixel size and the segmentation performance on medical image segmentation.
Unsupervised unstained cell detection by SIFT keypoint clustering and self-labeling algorithm.
Muallal, Firas; Schöll, Simon; Sommerfeldt, Björn; Maier, Andreas; Steidl, Stefan; Buchholz, Rainer; Hornegger, Joachim
2014-01-01
We propose a novel unstained cell detection algorithm based on unsupervised learning. The algorithm utilizes the scale invariant feature transform (SIFT), a self-labeling algorithm, and two clustering steps in order to achieve high performance in terms of time and detection accuracy. Unstained cell imaging is dominated by phase contrast and bright field microscopy. Therefore, the algorithm was assessed on images acquired using these two modalities. Five cell lines having in total 37 images and 7250 cells were considered for the evaluation: CHO, L929, Sf21, HeLa, and Bovine cells. The obtained F-measures were between 85.1 and 89.5. Compared to the state-of-the-art, the algorithm achieves very close F-measure to the supervised approaches in much less time.
Tramacere, A; Dubath, P; Kneib, J -P; Courbin, F
2016-01-01
We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional ones related to segments of spiral arms. We use two different supervised ensemble classification algorithms, Random Forest,...
Institute of Scientific and Technical Information of China (English)
Xiang Gao; Yintang Yang; Duan Zhou
2010-01-01
An effective algorithm based on signal coverage of effective communication and local energy-consumption saving strategy is proposed for the application in wireless sensor networks.This algorithm consists of two sub-algorithms.One is the multi-hop partition subspaces clustering algorithm for ensuring local energybalanced consumption ascribed to the deployment from another algorithm of distributed locating deployment based on efficient communication coverage probability(DLD-ECCP).DLD-ECCP makes use of the characteristics of Markov chain and probabilistic optimization to obtain the optimum topology and number of sensor nodes.Through simulation,the relative data demonstrate the advantages of the proposed approaches on saving hardware resources and energy consumption of networks.
Kim, R S J; Postman, M; Strauss, M A; Bahcall, Neta A; Gunn, J E; Lupton, R H; Annis, J; Nichol, R C; Castander, F J; Brinkmann, J; Brunner, R J; Connolly, A; Csabai, I; Hindsley, R B; Ivezic, Z; Vogeley, M S; York, D G; Kim, Rita S. J.; Kepner, Jeremy V.; Postman, Marc; Strauss, Michael A.; Bahcall, Neta A.; Gunn, James E.; Lupton, Robert H.; Annis, James; Nichol, Robert C.; Castander, Francisco J.; Brunner, Robert J.; Connolly, Andrew; Csabai, Istvan; Hindsley, Robert B.; Ivezic, Zeljko; Vogeley, Michael S.; York, Donald G.
2002-01-01
We present a comparison of three cluster finding algorithms from imaging data using Monte Carlo simulations of clusters embedded in a 25 deg^2 region of Sloan Digital Sky Survey (SDSS) imaging data: the Matched Filter (MF; Postman et al. 1996), the Adaptive Matched Filter (AMF; Kepner et al. 1999) and a color-magnitude filtered Voronoi Tessellation Technique (VTT). Among the two matched filters, we find that the MF is more efficient in detecting faint clusters, whereas the AMF evaluates the redshifts and richnesses more accurately, therefore suggesting a hybrid method (HMF) that combines the two. The HMF outperforms the VTT when using a background that is uniform, but it is more sensitive to the presence of a non-uniform galaxy background than is the VTT; this is due to the assumption of a uniform background in the HMF model. We thus find that for the detection thresholds we determine to be appropriate for the SDSS data, the performance of both algorithms are similar; we present the selection function for eac...
Minimum mutual information based level set clustering algorithm for fast MRI tissue segmentation.
Dai, Shuanglu; Man, Hong; Zhan, Shu
2015-01-01
Accurate and accelerated MRI tissue recognition is a crucial preprocessing for real-time 3d tissue modeling and medical diagnosis. This paper proposed an information de-correlated clustering algorithm implemented by variational level set method for fast tissue segmentation. The key idea is to design a local correlation term between original image and piecewise constant into the variational framework. The minimized correlation will then lead to de-correlated piecewise regions. Firstly, by introducing a continuous bounded variational domain describing the image, a probabilistic image restoration model is assumed to modify the distortion. Secondly, regional mutual information is introduced to measure the correlation between piecewise regions and original images. As a de-correlated description of the image, piecewise constants are finally solved by numerical approximation and level set evolution. The converged piecewise constants automatically clusters image domain into discriminative regions. The segmentation results show that our algorithm performs well in terms of time consuming, accuracy, convergence and clustering capability.
MixSim : An R Package for Simulating Data to Study Performance of Clustering Algorithms
Directory of Open Access Journals (Sweden)
Volodymyr Melnykov
2012-11-01
Full Text Available The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.
Using genetic algorithm based fuzzy adaptive resonance theory for clustering analysis
Institute of Scientific and Technical Information of China (English)
LIU Bo; WANG Yong; WANG Hong-jian
2006-01-01
In the clustering applications field, fuzzy adaptive resonance theory system has been widely applied. But, three parameters of fuzzy adaptive resonance theory need to be adjusted manually for obtaining better clustering. It needs much time to test and does not assure a best result. Genetic algorithm is an optimal mathematical search technique based on the principles of natural selection and genetic recombination. So, to make the fuzzy adaptive resonance theory parameters choosing process automation, an approach incorporating genetic algorithm and fuzzy adaptive resonance theory neural network has been applied. Then, the best clustering result can be obtained.Through experiment, it can be proved that the most appropriate parameters of fuzzy adaptive resonance theory can be gained effectively by this approach.
A REAL—TIME C—V CLUSTERING ALGORITHM FOR WEB—MINING
Institute of Scientific and Technical Information of China (English)
LiHaiying; ZuangZhenquan; 等
2002-01-01
In this letter, a real-time C-V (Characteristic-Vector) clustering algorithm is put forth to treat with vast action data which are dynamically collected from web site.The algo-fithm cites the concept of C-V to denote characteristic, synchronously it adopts two-value[0,1] input and self-definition vigilance parameter to design clustering-architecture.Vector Degree of Matching(VDM) plays a key role in the clustering algorithm, which determines the magnitude of typical characteristic.Making use of stability analysis, the classifications are confirmed to have reliably hierarchical structure when vigilance parameter shifts from 0.1 to 0.99.This non-linear relation between vigilance parameter and classification upper limit helps mining out representa-tive classifications from net-users according to the actural web resource, then administering system can map them to web resource space to implement the intelligent configuration effectually and reapidly.
You, Tao; Cheng, Hui-Min; Ning, Yi-Zi; Shia, Ben-Chang; Zhang, Zhong-Yuan
2016-12-01
Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in the network. To solve the problem, a new community detection method (named IsoFdp) is proposed in this paper. First, we use IsoMap technique to map the network data into a low dimensional manifold which can reveal diverse pair-wised similarity. Then Fdp is applied to detect the communities in the network. An improved partition density function is proposed to select the proper number of communities automatically. We test our method on both synthetic and real-world networks, and the results demonstrate the effectiveness of our algorithm over the state-of-the-art methods.
Institute of Scientific and Technical Information of China (English)
Mohammed A.M. Ibrahim; Lu Xinda; M. SaifMokbel
2005-01-01
The rapid growth of interconnected high performance workstations has produced a new computing paradigm called clustered of workstations computing. In these systems load balance problem is a serious impediment to achieve good performance. The main concern of this paper is the implementation of dynamic load balancing algorithm,asynchronous Round Robin (ARR), for balancing workload of parallel tree computation depth-first-search algorithm on Cluster of Heterogeneous Workstations (COW) Many algorithms in artificial intelligence and other areas of computer science are based on depth first search in implicitty defined trees. For these algorithms a loadbalancing scheme is required, which is able to evenly distribute parts of an irregularly shaped tree over the workstations with minimal interprocessor communication and without prior knowledge of the tree's shape. For the( ARR ) algorithm only minimal interpreeessor communication is needed when necessary and it runs under the MPI (Message passing interface) that allows parallel execution on heterogeneous SUN cluster of workstation platform. The program code is written in C language and executed under UNIX operating system (Solaris version).
Directory of Open Access Journals (Sweden)
Tcha Hong
2008-01-01
Full Text Available Abstract Background The previous studies of genome-wide expression patterns show that a certain percentage of genes are cell cycle regulated. The expression data has been analyzed in a number of different ways to identify cell cycle dependent genes. In this study, we pose the hypothesis that cell cycle dependent genes are considered as oscillating systems with a rhythm, i.e. systems producing response signals with period and frequency. Therefore, we are motivated to apply the theory of multivariate phase synchronization for clustering cell cycle specific genome-wide expression data. Results We propose the strategy to find groups of genes according to the specific biological process by analyzing cell cycle specific gene expression data. To evaluate the propose method, we use the modified Kuramoto model, which is a phase governing equation that provides the long-term dynamics of globally coupled oscillators. With this equation, we simulate two groups of expression signals, and the simulated signals from each group shares their own common rhythm. Then, the simulated expression data are mixed with randomly generated expression data to be used as input data set to the algorithm. Using these simulated expression data, it is shown that the algorithm is able to identify expression signals that are involved in the same oscillating process. We also evaluate the method with yeast cell cycle expression data. It is shown that the output clusters by the proposed algorithm include genes, which are closely associated with each other by sharing significant Gene Ontology terms of biological process and/or having relatively many known biological interactions. Therefore, the evaluation analysis indicates that the method is able to identify expression signals according to the specific biological process. Our evaluation analysis also indicates that some portion of output by the proposed algorithm is not obtainable by the traditional clustering algorithm with
AN EFFICIENT UE CLUSTER HEAD SELECTION ALGORITHM IN WIRELESS SENSOR NETWORKS AND CELLULAR NETWORKS
Institute of Scientific and Technical Information of China (English)
Shan Lianhai; Ouyang Yuling; Yuan Zhi; Fang Weidong; Hu Honglin
2013-01-01
Wireless Sensor Networks (WSNs) have been applied in many different areas.Energy etficient algorithms and protocols have become one of the most challenging issues for WSN.Many researchers focused on developing energy efficient clustering algorithms for WSN,but less research has been concerned in the mobile User Equipment (UE) acting as a Cluster Head (CH) for data transmission between cellular networks and WSNs.In this paper,we propose a cellular-assisted UE CH selection algorithm for the WSN,which considers several parameters to choose the optimal UE gateway CH.We analyze the energy cost of data transmission from a sensor node to the next node or gateway and calculate the whole system energy cost for a WSN.Simulation results show that better system performance,in terms of system energy cost and WSNs life time,can be achieved by using interactive optimization with cellular networks.
A Novel Image Fusion Algorithm for Visible and PMMW Images based on Clustering and NSCT
Directory of Open Access Journals (Sweden)
Xiong Jintao
2016-01-01
Full Text Available Aiming at the fusion of visible and Passive Millimeter Wave (PMMW images, a novel algorithm based on clustering and NSCT (Nonsubsampled Contourlet Transform is proposed. It takes advantages of the particular ability of PMMW image in presenting metal target and uses the clustering algorithm for PMMW image to extract the potential target regions. In the process of fusion, NSCT is applied to both input images, and then the decomposition coefficients on different scale are combined using different rules. At last, the fusion image is obtained by taking the inverse NSCT of the fusion coefficients. Some methodologies are used to evaluate the fusion results. Experiments demonstrate the superiority of the proposed algorithm for metal target detection compared to wavelet transform and Laplace transform.
Risk analysis of dam based on artificial bee colony algorithm with fuzzy c-means clustering
Energy Technology Data Exchange (ETDEWEB)
Li, Haojin; Li, Junjie; Kang, Fei
2011-05-15
Risk analysis is a method which has been incorporated into infrastructure engineering. Fuzzy c-means clustering (FCM) is a simple and fast method utilized most of the time, but it can induce errors as it is sensitive to initialization. The aim of this paper was to propose a new method for risk analysis using an artificial bee colony algorithm (ABC) with FCM. This new technique is first explained and then applied on three experiments. Results demonstrated that the combination of artificial bee colony algorithm fuzzy c-means clustering (ABCFCM) is overcoming the FCM issue since it is not initialization sensitive and experiments showed that this algorithm is more accurate and than FCM. This paper provides a new tool for risk analysis which can be used for risk prioritizing and reinforcing dangerous dams in a more scientific way.
Usage of Clustering Algorithm to Segment Image into Simply Connected Domains
Directory of Open Access Journals (Sweden)
S. V. Belim
2015-01-01
Full Text Available The article suggests a method of image segmentation into simply connected domains based on color. Pixels from an original image are represented as points in five-dimensional space which includes three color and two spatial coordinates. The points are normalized in order to eliminate distinguished characteristics. The set of points is compared with a weighted complete graph. The points of five-dimensional space are vertexes in the graph. Euclidian distance between the points is used as weights of the edges in the graph. To solve the task of clustering, a minimum spanning tree of the graph is built. For clustering, the tree is separated into sub-trees by removing some edges. Each sub-tree is a simply connected domain on the original image. In order to improve algorithm speed and reduce memory usage a greedy algorithm is used to build this minimum spanning tree for the graph. Edges to be removed are searched on the graph representing the length of an added edge versus a sequence number of its adding to the tree in the greedy algorithm. The desired edges are detected as maximums on the graphic. This search is based on assumption that transition to an adjacent cluster leads to connection of longer edge in comparison with edges within a cluster. Segmentation into clusters is iterative. At each step the bigger clusters are divided into smaller ones. It means that hierarchy of clusters can be built. A computer experiment was carried out using different images.The suggested method has no disadvantages of the most common method of k-means and allows dividing domains with different colors but the same intensity. Therewith there is no need to specify a number of clusters. Instead, it is necessary to choose a segmentation depth then a number of clusters will be automatically defined. The suggested method has no disadvantages of detection of image edges either. It is sufficient to find one point of image edge to separate two domains.A distinctive feature of
Energy Technology Data Exchange (ETDEWEB)
Dong, Feng; Pierpaoli, Elena; Gunn, James E.; Wechsler, Risa H.
2007-10-29
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is {approx} 85% complete and over 90% pure for clusters with masses above 1.0 x 10{sup 14}h{sup -1} M and redshifts up to z = 0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensity of {Delta} = 200, we find the derived cluster richness {Lambda}{sub 200} a roughly linear indicator of its virial mass M{sub 200}, which well recovers the relation between total luminosity and cluster mass of the input simulation.
Directory of Open Access Journals (Sweden)
Татьяна Борисовна Шатовская
2015-03-01
Full Text Available In this work results of modified Chameleon algorithm are discussed. Hierarchical multilevel algorithms consist of several stages: building the graph, coarsening, partitioning, recovering. Exploring of clustering quality for different data sets with different combinations of algorithms on different stages of the algorithm is the main aim of the article. And also aim is improving the construction phase through the optimization algorithm of choice k in the building the graph k-nearest neighbors
A General-Purpose Optimization Engine for Multi-Disciplinary Design Applications
Patnaik, Surya N.; Hopkins, Dale A.; Berke, Laszlo
1996-01-01
A general purpose optimization tool for multidisciplinary applications, which in the literature is known as COMETBOARDS, is being developed at NASA Lewis Research Center. The modular organization of COMETBOARDS includes several analyzers and state-of-the-art optimization algorithms along with their cascading strategy. The code structure allows quick integration of new analyzers and optimizers. The COMETBOARDS code reads input information from a number of data files, formulates a design as a set of multidisciplinary nonlinear programming problems, and then solves the resulting problems. COMETBOARDS can be used to solve a large problem which can be defined through multiple disciplines, each of which can be further broken down into several subproblems. Alternatively, a small portion of a large problem can be optimized in an effort to improve an existing system. Some of the other unique features of COMETBOARDS include design variable formulation, constraint formulation, subproblem coupling strategy, global scaling technique, analysis approximation, use of either sequential or parallel computational modes, and so forth. The special features and unique strengths of COMETBOARDS assist convergence and reduce the amount of CPU time used to solve the difficult optimization problems of aerospace industries. COMETBOARDS has been successfully used to solve a number of problems, including structural design of space station components, design of nozzle components of an air-breathing engine, configuration design of subsonic and supersonic aircraft, mixed flow turbofan engines, wave rotor topped engines, and so forth. This paper introduces the COMETBOARDS design tool and its versatility, which is illustrated by citing examples from structures, aircraft design, and air-breathing propulsion engine design.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
A priori data-driven multi-clustered reservoir generation algorithm for echo state network.
Directory of Open Access Journals (Sweden)
Xiumin Li
Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.
Scalable fault tolerant algorithms for linear-scaling coupled-cluster electronic structure methods.
Energy Technology Data Exchange (ETDEWEB)
Leininger, Matthew L.; Nielsen, Ida Marie B.; Janssen, Curtis L.
2004-10-01
By means of coupled-cluster theory, molecular properties can be computed with an accuracy often exceeding that of experiment. The high-degree polynomial scaling of the coupled-cluster method, however, remains a major obstacle in the accurate theoretical treatment of mainstream chemical problems, despite tremendous progress in computer architectures. Although it has long been recognized that this super-linear scaling is non-physical, the development of efficient reduced-scaling algorithms for massively parallel computers has not been realized. We here present a locally correlated, reduced-scaling, massively parallel coupled-cluster algorithm. A sparse data representation for handling distributed, sparse multidimensional arrays has been implemented along with a set of generalized contraction routines capable of handling such arrays. The parallel implementation entails a coarse-grained parallelization, reducing interprocessor communication and distributing the largest data arrays but replicating as many arrays as possible without introducing memory bottlenecks. The performance of the algorithm is illustrated by several series of runs for glycine chains using a Linux cluster with an InfiniBand interconnect.
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks
Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong
2017-01-01
Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes’ energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs’ election, we take nodes’ energies, nodes’ degree and neighbor nodes’ residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks. PMID:28671641
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.
Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong
2017-07-03
Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.
Directory of Open Access Journals (Sweden)
Liling Sun
2015-01-01
Full Text Available An improved multiobjective ABC algorithm based on K-means clustering, called CMOABC, is proposed. To fasten the convergence rate of the canonical MOABC, the way of information communication in the employed bees’ phase is modified. For keeping the population diversity, the multiswarm technology based on K-means clustering is employed to decompose the population into many clusters. Due to each subcomponent evolving separately, after every specific iteration, the population will be reclustered to facilitate information exchange among different clusters. Application of the new CMOABC on several multiobjective benchmark functions shows a marked improvement in performance over the fast nondominated sorting genetic algorithm (NSGA-II, the multiobjective particle swarm optimizer (MOPSO, and the multiobjective ABC (MOABC. Finally, the CMOABC is applied to solve the real-world optimal power flow (OPF problem that considers the cost, loss, and emission impacts as the objective functions. The 30-bus IEEE test system is presented to illustrate the application of the proposed algorithm. The simulation results demonstrate that, compared to NSGA-II, MOPSO, and MOABC, the proposed CMOABC is superior for solving OPF problem, in terms of optimization accuracy.
A clustering algorithm for sample data based on environmental pollution characteristics
Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun
2015-04-01
Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.
Application of K-Means Algorithm for Cluster Analysis on Poverty of Provinces in Indonesia
Directory of Open Access Journals (Sweden)
Albert Verasius Dian Sano
2016-06-01
Full Text Available The objective of this study is to apply cluster analysis or also known as clustering on poverty data of provinces all over Indonesia.The problem is that the decision makers such as central government, local government and non-government organizations, which involve in poverty problems, need a tool to support decision-making process related to social welfare problems. The method used in the cluster analysis is k-means algorithm. The data used in this study were drawn from Badan Pusat Statistik (BPS or Central Bureau of Statistics on 2014.Cluster analysis in this study took characteristics of data such as absolute poverty of each province, relative number or percentage of poverty of each province, and the level of depth index poverty of each province in Indonesia. Results of cluster analysis in this study were presented in the form of grouping of clusters' members visually. Cluster analysis in the study could be used to identify more quickly and efficiently on poverty chart of all provinces all over Indonesia. The results of such identification can be used by policy makers who have interests of eradicating the problems associated with poverty and welfare distribution in Indonesia, ranging from government organizations, non-governmental organizations, and also private organizations.
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs
Choi, Woo-Yong; Chatterjee, Mainak
2015-03-01
With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
Density-based cluster algorithms for the identification of core sets
Lemke, Oliver; Keller, Bettina G.
2016-10-01
The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.
An adaptive enhancement algorithm for infrared video based on modified k-means clustering
Zhang, Linze; Wang, Jingqi; Wu, Wen
2016-09-01
In this paper, we have proposed a video enhancement algorithm to improve the output video of the infrared camera. Sometimes the video obtained by infrared camera is very dark since there is no clear target. In this case, infrared video should be divided into frame images by frame extraction, in order to carry out the image enhancement. For the first frame image, which can be divided into k sub images by using K-means clustering according to the gray interval it occupies before k sub images' histogram equalization according to the amount of information per sub image, we used a method to solve a problem that final cluster centers close to each other in some cases; and for the other frame images, their initial cluster centers can be determined by the final clustering centers of the previous ones, and the histogram equalization of each sub image will be carried out after image segmentation based on K-means clustering. The histogram equalization can make the gray value of the image to the whole gray level, and the gray level of each sub image is determined by the ratio of pixels to a frame image. Experimental results show that this algorithm can improve the contrast of infrared video where night target is not obvious which lead to a dim scene, and reduce the negative effect given by the overexposed pixels adaptively in a certain range.
A Clustering Algorithm for Planning the Integration Process of a Large Number of Conceptual Schemas
Institute of Scientific and Technical Information of China (English)
Carlo Batini; Paola Bonizzoni; Marco Comerio; Riccardo Dondi; Yuri Pirola; Francesco Salandra
2015-01-01
When tens and even hundreds of schemas are involved in the integration process, criteria are needed for choosing clusters of schemas to be integrated, so as to deal with the integration problem through an eﬃcient iterative process. Schemas in clusters should be chosen according to cohesion and coupling criteria that are based on similarities and dissimilarities among schemas. In this paper, we propose an algorithm for a novel variant of the correlation clustering approach that addresses the problem of assisting a designer in integrating a large number of conceptual schemas. The novel variant introduces upper and lower bounds to the number of schemas in each cluster, in order to avoid too complex and too simple integration contexts respectively. We give a heuristic for solving the problem, being an NP hard combinatorial problem. An experimental activity demonstrates an appreciable increment in the effectiveness of the schema integration process when clusters are computed by means of the proposed algorithm w.r.t. the ones manually defined by an expert.
Lelu, Alain; Cuxac, Pascal
2008-01-01
We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.
Gusev, Alexander; Chuluunbaatar, Ochbadrakh; Rostovtsev, Vitaly; Hai, Luong Le; Derbov, Vladimir; Gozdz, Andrzej; Klimov, Evgenii
2013-01-01
The quantum model of a cluster, consisting of A identical particles, coupled by the internal pair interactions and affected by the external field of a target, is considered. A symbolic-numerical algorithm for generating A-1-dimensional oscillator eigenfunctions, symmetric or antisymmetric with respect to permutations of A identical particles in the new symmetrized coordinates, is formulated and implemented using the MAPLE computer algebra system. Examples of generating the symmetrized coordinate representation for A-1 dimensional oscillator functions in one-dimensional Euclidean space are analyzed. The approach is aimed at solving the problem of tunnelling the clusters, consisting of several identical particles, through repulsive potential barriers of a target.
K-Means Re-Clustering-Algorithmic Options with Quantifiable Performance Comparisons
Energy Technology Data Exchange (ETDEWEB)
Meyer, A W; Paglieroni, D; Asteneh, C
2002-12-17
This paper presents various architectural options for implementing a K-Means Re-Clustering algorithm suitable for unsupervised segmentation of hyperspectral images. Performance metrics are developed based upon quantitative comparisons of convergence rates and segmentation quality. A methodology for making these comparisons is developed and used to establish K values that produce the best segmentations with minimal processing requirements. Convergence rates depend on the initial choice of cluster centers. Consequently, this same methodology may be used to evaluate the effectiveness of different initialization techniques.
An Airborne Radar Clutter Tracking Algorithm Based on Multifractal and Fuzzy C-Mean Cluster
Institute of Scientific and Technical Information of China (English)
Wei Zhang; Sheng-Lin Yu; Gong Zhang
2007-01-01
For an airborne lookdown radar, clutter power often changes dynamically about 80 dB with wide distributions as the platform moves. Therefore, clutter tracking techniques are required to guide the selection of const false alarm rate (CFAR) schemes. In this work, clutter tracking is done in image domain and an algorithm combining multifractal and fuzzy C-mean (FCM) cluster is proposed. The clutter with large dynamic distributions in power density is converted to steady distributions of multifractal exponents by the multifractal transformation with the optimum moment. Then, later, the main lobe and side lobe are tracked from the multifractal exponents by FCM clustering method.
Pluchino, Alessandro; Latora, Vito
2008-01-01
We have recently introduced an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Singh, Sudhakar; Garg, Rakhi; Mishra, P. K.
2015-10-01
Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big datasets it is essential to re-design the data mining algorithm on this new paradigm. In this paper, we implement three variations of Apriori algorithm using data structures hash tree, trie and hash table trie i.e. trie with hash technique on MapReduce paradigm. We emphasize and investigate the significance of these three data structures for Apriori algorithm on Hadoop cluster, which has not been given attention yet. Experiments are carried out on both real life and synthetic datasets which shows that hash table trie data structures performs far better than trie and hash tree in terms of execution time. Moreover the performance in case of hash tree becomes worst.
Hopfield-K-Means clustering algorithm: A proposal for the segmentation of electricity customers
Energy Technology Data Exchange (ETDEWEB)
Lopez, Jose J.; Aguado, Jose A.; Martin, F.; Munoz, F.; Rodriguez, A.; Ruiz, Jose E. [Department of Electrical Engineering, University of Malaga, C/ Dr. Ortiz Ramos, sn., Escuela de Ingenierias, 29071 Malaga (Spain)
2011-02-15
Customer classification aims at providing electric utilities with a volume of information to enable them to establish different types of tariffs. Several methods have been used to segment electricity customers, including, among others, the hierarchical clustering, Modified Follow the Leader and K-Means methods. These, however, entail problems with the pre-allocation of the number of clusters (Follow the Leader), randomness of the solution (K-Means) and improvement of the solution obtained (hierarchical algorithm). Another segmentation method used is Hopfield's autonomous recurrent neural network, although the solution obtained only guarantees that it is a local minimum. In this paper, we present the Hopfield-K-Means algorithm in order to overcome these limitations. This approach eliminates the randomness of the initial solution provided by K-Means based algorithms and it moves closer to the global optimun. The proposed algorithm is also compared against other customer segmentation and characterization techniques, on the basis of relative validation indexes. Finally, the results obtained by this algorithm with a set of 230 electricity customers (residential, industrial and administrative) are presented. (author)
Study of cluster reconstruction and track fitting algorithms for CGEM-IT at BESIII
Guo, Yue; Ju, Xu-Dong; Wu, Ling-Hui; Xiu, Qing-Lei; Wang, Hai-Xia; Dong, Ming-Yi; Hu, Jing-Ran; Li, Wei-Dong; Li, Wei-Guo; Liu, Huai-Min; Ou-Yang, Qun; Shen, Xiao-Yan; Yuan, Ye; Zhang, Yao
2015-01-01
Considering the aging effects of existing Inner Drift Chamber (IDC) of BES\\uppercase\\expandafter{\\romannumeral3}, a GEM based inner tracker is proposed to be designed and constructed as an upgrade candidate for IDC. This paper introduces a full simulation package of CGEM-IT with a simplified digitization model, describes the development of the softwares for cluster reconstruction and track fitting algorithm based on Kalman filter method for CGEM-IT. Preliminary results from the reconstruction algorithms are obtained using a Monte Carlo sample of single muon events in CGEM-IT.
Clustering-boundary-detection algorithm based on center-of-gravity of neighborhood
Directory of Open Access Journals (Sweden)
Wang Gui Zhi
2013-07-01
Full Text Available The cluster boundary is a useful model, in order to identify the boundary effectively, according to the uneven distribution of data points int the epsilon neighborhood of boundary objects, this paper proposes a boundary detection algorithm ---- S-BOUND. Firstly, all the points in the epsilon neighborhood of the data objects are projected onto the boundary of the convex hull of the neighborhood, and then calculate the center of gravity of the neighborhood. Finally, detect the boundary object according to the degree of deviation of the center of gravity of the neighborhood with the object. The experimental results show that the S-BOUND algorithm can accurately detect a variety of clustering boundary and remove the noises, the time of performance is also better.
Heuristic file sorted assignment algorithm of parallel I/O on cluster computing system
Institute of Scientific and Technical Information of China (English)
CHEN Zhi-gang; ZENG Bi-qing; XIONG Ce; DENG Xiao-heng; ZENG Zhi-wen; LIU An-feng
2005-01-01
A new file assignment strategy of parallel I/O, which is named heuristic file sorted assignment algorithm was proposed on cluster computing system. Based on the load balancing, it assigns the files to the same disk according to the similar service time. Firstly, the files were sorted and stored at the set I in descending order in terms of their service time, then one disk of cluster node was selected randomly when the files were to be assigned, and at last the continuous files were taken orderly from the set I to the disk until the disk reached its load maximum. The experimental results show that the new strategy improves the performance by 20.2% when the load of the system is light and by 31.6% when the load is heavy. And the higher the data access rate, the more evident the improvement of the performance obtained by the heuristic file sorted assignment algorithm.
Realization of R-tree for GIS on hybrid clustering algorithm
Institute of Scientific and Technical Information of China (English)
HUANG Ji-xian; BAO Guang-shu; LI Qing-song
2005-01-01
The characteristic of geographic information system(GIS) spatial data operation is that query is much more frequent than insertion and deletion, and a new hybrid spatial clustering method used to build R-tree for GIS spatial data was proposed in this paper. According to the aggregation of clustering method, R-tree was used to construct rules and specialty of spatial data. HCR-tree was the R-tree built with HCR algorithm. To test the efficiency of HCR algorithm, it was applied not only to the data organization of static R-tree but also to the nodes splitting of dynamic R-tree. The results show that R-tree with HCR has some advantages such as higher searching efficiency, less disk accesses and so on.
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Directory of Open Access Journals (Sweden)
Jibing Wu
2017-01-01
Full Text Available Clustering analysis is a basic and essential method for mining heterogeneous information networks, which consist of multiple types of objects and rich semantic relations among different object types. Heterogeneous information networks are ubiquitous in the real-world applications, such as bibliographic networks and social media networks. Unfortunately, most existing approaches, such as spectral clustering, are designed to analyze homogeneous information networks, which are composed of only one type of objects and links. Some recent studies focused on heterogeneous information networks and yielded some research fruits, such as RankClus and NetClus. However, they often assumed that the heterogeneous information networks usually follow some simple schemas, such as bityped network schema or star network schema. To overcome the above limitations, we model the heterogeneous information network as a tensor without the restriction of network schema. Then, a tensor CP decomposition method is adapted to formulate the clustering problem in heterogeneous information networks. Further, we develop two stochastic gradient descent algorithms, namely, SGDClus and SOSClus, which lead to effective clustering multityped objects simultaneously. The experimental results on both synthetic datasets and real-world dataset have demonstrated that our proposed clustering framework can model heterogeneous information networks efficiently and outperform state-of-the-art clustering methods.
Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi-GPU Clusters
2012-12-01
are suited for threaded (parallel) execution, by labeling them as kernels using syntax specified by the GPU programming language (e.g., CUDA for an...Techniques for Mapping Synthetic Aperture Radar Processing Algorithms to Multi- GPU Clusters Eric Hayden, Mark Schmalz, William Chapman, Sanjay...Abstract - This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using multiple Graphics Processing Units ( GPUs ). Our
Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm
Khashabi, Daniel; Wieting, John; Liu, Jeffrey Yufei; Liang, Feng
2015-01-01
In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of ou...
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm
Directory of Open Access Journals (Sweden)
Serge Thomas Mickala Bourobou
2015-05-01
Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.
Bourobou, Serge Thomas Mickala; Yoo, Younghwan
2015-05-21
This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.
Arimbi, Mentari Dian; Bustamam, Alhadi; Lestari, Dian
2017-03-01
Data clustering can be executed through partition or hierarchical method for many types of data including DNA sequences. Both clustering methods can be combined by processing partition algorithm in the first level and hierarchical in the second level, called hybrid clustering. In the partition phase some popular methods such as PAM, K-means, or Fuzzy c-means methods could be applied. In this study we selected partitioning around medoids (PAM) in our partition stage. Furthermore, following the partition algorithm, in hierarchical stage we applied divisive analysis algorithm (DIANA) in order to have more specific clusters and sub clusters structures. The number of main clusters is determined using Davies Bouldin Index (DBI) value. We choose the optimal number of clusters if the results minimize the DBI value. In this work, we conduct the clustering on 1252 HPV DNA sequences data from GenBank. The characteristic extraction is initially performed, followed by normalizing and genetic distance calculation using Euclidean distance. In our implementation, we used the hybrid PAM and DIANA using the R open source programming tool. In our results, we obtained 3 main clusters with average DBI value is 0.979, using PAM in the first stage. After executing DIANA in the second stage, we obtained 4 sub clusters for Cluster-1, 9 sub clusters for Cluster-2 and 2 sub clusters in Cluster-3, with the BDI value 0.972, 0.771, and 0.768 for each main cluster respectively. Since the second stage produce lower DBI value compare to the DBI value in the first stage, we conclude that this hybrid approach can improve the accuracy of our clustering results.
Crowded Cluster Cores: Algorithms for Deblending in Dark Energy Survey Images
Zhang, Yuanyuan; Bertin, Emmanuel; Jeltema, Tesla; Miller, Christopher J; Rykoff, Eli; Song, Jeeseon
2014-01-01
Deep optical images are often crowded with overlapping objects. This is especially true in the cores of galaxy clusters, where images of dozens of galaxies may lie atop one another. Accurate measurements of cluster properties require deblending algorithms designed to automatically extract a list of individual objects and decide what fraction of the light in each pixel comes from each object. We present new software called the Gradient And INterpolation based deblender (GAIN) as a secondary deblender to improve deblending the images of cluster cores. This software relies on using image intensity gradient and using an image interpolation technique usually used to correct flawed terrestrial digital images. We test this software on Dark Energy Survey coadd images. GAIN helps extracting unbiased photometry measurement for blended sources. It also helps improving detection completeness while introducing only a modest amount of spurious detections. For example, when applied to deep images simulated with high level o...
Dong, Feng; Gunn, James E; Wechsler, Risa H
2007-01-01
We present a modified adaptive matched filter algorithm designed to identify clusters of galaxies in wide-field imaging surveys such as the Sloan Digital Sky Survey. The cluster-finding technique is fully adaptive to imaging surveys with spectroscopic coverage, multicolor photometric redshifts, no redshift information at all, and any combination of these within one survey. It works with high efficiency in multi-band imaging surveys where photometric redshifts can be estimated with well-understood error distributions. Tests of the algorithm on realistic mock SDSS catalogs suggest that the detected sample is ~85% complete and over 90% pure for clusters with masses above 1.0*10^{14} h^{-1} M_solar and redshifts up to z=0.45. The errors of estimated cluster redshifts from maximum likelihood method are shown to be small (typically less that 0.01) over the whole redshift range with photometric redshift errors typical of those found in the Sloan survey. Inside the spherical radius corresponding to a galaxy overdensi...
A Survey of Text Clustering Algorithms%文本聚类算法综述
Institute of Scientific and Technical Information of China (English)
史梦洁
2014-01-01
聚类算法作为发现数据内在结构与分布特征的无监督学习方法，被广泛应用于各个领域。伴随着互联网的高速发展和在线文档数量的大幅增加，文本聚类已成为一项重要任务。讨论文本聚类算法的基本概念与应用场景，对文本聚类算法及评价方法进行综述。%As an unsupervised machine learning method, clustering algorithms discover the inherent struct and distrubution of data, which are widely used in various fields. With the rapid development of the Internet and substantial increase of online documents, text clustering has be-come an important task. Discusses the basic concepts and application scenarios of text clustering, reviews the algorithms and evaluation methods of text clustering.
Sun, Jiajia; Li, Yaoguo
2017-02-01
Joint inversion that simultaneously inverts multiple geophysical data sets to recover a common Earth model is increasingly being applied to exploration problems. Petrophysical data can serve as an effective constraint to link different physical property models in such inversions. There are two challenges, among others, associated with the petrophysical approach to joint inversion. One is related to the multimodality of petrophysical data because there often exist more than one relationship between different physical properties in a region of study. The other challenge arises from the fact that petrophysical relationships have different characteristics and can exhibit point, linear, quadratic, or exponential forms in a crossplot. The fuzzy c-means (FCM) clustering technique is effective in tackling the first challenge and has been applied successfully. We focus on the second challenge in this paper and develop a joint inversion method based on variations of the FCM clustering technique. To account for the specific shapes of petrophysical relationships, we introduce several different fuzzy clustering algorithms that are capable of handling different shapes of petrophysical relationships. We present two synthetic and one field data examples and demonstrate that, by choosing appropriate distance measures for the clustering component in the joint inversion algorithm, the proposed joint inversion method provides an effective means of handling common petrophysical situations we encounter in practice. The jointly inverted models have both enhanced structural similarity and increased petrophysical correlation, and better represent the subsurface in the spatial domain and the parameter domain of physical properties.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms
Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel
2016-04-01
Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Ortiz, Juan F; Rokas, Antonis
2017-01-01
Closely spaced clusters of tandemly duplicated genes (CTDGs) contribute to the diversity of many phenotypes, including chemosensation, snake venom, and animal body plans. CTDGs have traditionally been identified subjectively as genomic neighborhoods containing several gene duplicates in close proximity; however, CTDGs are often highly variable with respect to gene number, intergenic distance, and synteny. This lack of formal definition hampers the study of CTDG evolutionary dynamics and the discovery of novel CTDGs in the exponentially growing body of genomic data. To address this gap, we developed a novel homology-based algorithm, CTDGFinder, which formalizes and automates the identification of CTDGs by examining the physical distribution of individual members of families of duplicated genes across chromosomes. Application of CTDGFinder accurately identified CTDGs for many well-known gene clusters (e.g., Hox and beta-globin gene clusters) in the human, mouse and 20 other mammalian genomes. Differences between previously annotated gene clusters and our inferred CTDGs were due to the exclusion of nonhomologs that have historically been considered parts of specific gene clusters, the inclusion or absence of genes between the CTDGs and their corresponding gene clusters, and the splitting of certain gene clusters into distinct CTDGs. Examination of human genes showing tissue-specific enhancement of their expression by CTDGFinder identified members of several well-known gene clusters (e.g., cytochrome P450s and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating CTDG identification, CTDGFinder will facilitate understanding of CTDG evolutionary dynamics, their functional implications, and how they are associated with phenotypic diversity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.
Rong, Zhou; Tianyu, Ma; Yongjie, Jin
2005-01-01
In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies.
Incremental Density-Based Link Clustering Algorithm for Community Detection in Dynamic Networks
Directory of Open Access Journals (Sweden)
Fanrong Meng
2016-01-01
Full Text Available Community detection in complex networks has become a research hotspot in recent years. However, most of the existing community detection algorithms are designed for the static networks; namely, the connections between the nodes are invariable. In this paper, we propose an incremental density-based link clustering algorithm for community detection in dynamic networks, iDBLINK. This algorithm is an extended version of DBLINK which is proposed in our previous work. It can update the local link community structure in the current moment through the change of similarity between the edges at the adjacent moments, which includes the creation, growth, merging, deletion, contraction, and division of link communities. Extensive experimental results demonstrate that iDBLINK not only has a great time efficiency, but also maintains a high quality community detection performance when the network topology is changing.