WorldWideScience

Sample records for hierarchical clustering algorithm

  1. A Novel Divisive Hierarchical Clustering Algorithm for Geospatial Analysis

    Directory of Open Access Journals (Sweden)

    Shaoning Li

    2017-01-01

    Full Text Available In the fields of geographic information systems (GIS and remote sensing (RS, the clustering algorithm has been widely used for image segmentation, pattern recognition, and cartographic generalization. Although clustering analysis plays a key role in geospatial modelling, traditional clustering methods are limited due to computational complexity, noise resistant ability and robustness. Furthermore, traditional methods are more focused on the adjacent spatial context, which makes it hard for the clustering methods to be applied to multi-density discrete objects. In this paper, a new method, cell-dividing hierarchical clustering (CDHC, is proposed based on convex hull retraction. The main steps are as follows. First, a convex hull structure is constructed to describe the global spatial context of geospatial objects. Then, the retracting structure of each borderline is established in sequence by setting the initial parameter. The objects are split into two clusters (i.e., “sub-clusters” if the retracting structure intersects with the borderlines. Finally, clusters are repeatedly split and the initial parameter is updated until the terminate condition is satisfied. The experimental results show that CDHC separates the multi-density objects from noise sufficiently and also reduces complexity compared to the traditional agglomerative hierarchical clustering algorithm.

  2. Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

    Science.gov (United States)

    Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

    2012-01-01

    Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043

  3. Evaluation of Hierarchical Clustering Algorithms for Document Datasets

    National Research Council Canada - National Science Library

    Zhao, Ying; Karypis, George

    2002-01-01

    Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters...

  4. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space

    OpenAIRE

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-01-01

    Motivation: UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. Application: We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without ex...

  5. Neutrosophic Hierarchical Clustering Algoritms

    Directory of Open Access Journals (Sweden)

    Rıdvan Şahin

    2014-03-01

    Full Text Available Interval neutrosophic set (INS is a generalization of interval valued intuitionistic fuzzy set (IVIFS, whose the membership and non-membership values of elements consist of fuzzy range, while single valued neutrosophic set (SVNS is regarded as extension of intuitionistic fuzzy set (IFS. In this paper, we extend the hierarchical clustering techniques proposed for IFSs and IVIFSs to SVNSs and INSs respectively. Based on the traditional hierarchical clustering procedure, the single valued neutrosophic aggregation operator, and the basic distance measures between SVNSs, we define a single valued neutrosophic hierarchical clustering algorithm for clustering SVNSs. Then we extend the algorithm to classify an interval neutrosophic data. Finally, we present some numerical examples in order to show the effectiveness and availability of the developed clustering algorithms.

  6. Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

    Science.gov (United States)

    Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

    2017-09-01

    Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing

  7. The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

    Science.gov (United States)

    Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

    2018-04-01

    Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.

  8. The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

    KAUST Repository

    Euán, Carolina

    2018-04-12

    We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data.

  9. Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space.

    Science.gov (United States)

    Loewenstein, Yaniv; Portugaly, Elon; Fromer, Menachem; Linial, Michal

    2008-07-01

    UPGMA (average linking) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. However, UPGMA requires the entire dissimilarity matrix in memory. Due to this prohibitive requirement, UPGMA is not scalable to very large datasets. We present a novel class of memory-constrained UPGMA (MC-UPGMA) algorithms. Given any practical memory size constraint, this framework guarantees the correct clustering solution without explicitly requiring all dissimilarities in memory. The algorithms are general and are applicable to any dataset. We present a data-dependent characterization of hardness and clustering efficiency. The presented concepts are applicable to any agglomerative clustering formulation. We apply our algorithm to the entire collection of protein sequences, to automatically build a comprehensive evolutionary-driven hierarchy of proteins from sequence alone. The newly created tree captures protein families better than state-of-the-art large-scale methods such as CluSTr, ProtoNet4 or single-linkage clustering. We demonstrate that leveraging the entire mass embodied in all sequence similarities allows to significantly improve on current protein family clusterings which are unable to directly tackle the sheer mass of this data. Furthermore, we argue that non-metric constraints are an inherent complexity of the sequence space and should not be overlooked. The robustness of UPGMA allows significant improvement, especially for multidomain proteins, and for large or divergent families. A comprehensive tree built from all UniProt sequence similarities, together with navigation and classification tools will be made available as part of the ProtoNet service. A C++ implementation of the algorithm is available on request.

  10. Using Hierarchical Time Series Clustering Algorithm and Wavelet Classifier for Biometric Voice Classification

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2012-01-01

    Full Text Available Voice biometrics has a long history in biosecurity applications such as verification and identification based on characteristics of the human voice. The other application called voice classification which has its important role in grouping unlabelled voice samples, however, has not been widely studied in research. Lately voice classification is found useful in phone monitoring, classifying speakers’ gender, ethnicity and emotion states, and so forth. In this paper, a collection of computational algorithms are proposed to support voice classification; the algorithms are a combination of hierarchical clustering, dynamic time wrap transform, discrete wavelet transform, and decision tree. The proposed algorithms are relatively more transparent and interpretable than the existing ones, though many techniques such as Artificial Neural Networks, Support Vector Machine, and Hidden Markov Model (which inherently function like a black box have been applied for voice verification and voice identification. Two datasets, one that is generated synthetically and the other one empirically collected from past voice recognition experiment, are used to verify and demonstrate the effectiveness of our proposed voice classification algorithm.

  11. The Hierarchical Spectral Merger Algorithm: A New Time Series Clustering Procedure

    KAUST Repository

    Euá n, Carolina; Ombao, Hernando; Ortega, Joaquí n

    2018-01-01

    We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms

  12. SDN‐Based Hierarchical Agglomerative Clustering Algorithm for Interference Mitigation in Ultra‐Dense Small Cell Networks

    Directory of Open Access Journals (Sweden)

    Guang Yang

    2018-04-01

    Full Text Available Ultra‐dense small cell networks (UD‐SCNs have been identified as a promising scheme for next‐generation wireless networks capable of meeting the ever‐increasing demand for higher transmission rates and better quality of service. However, UD‐SCNs will inevitably suffer from severe interference among the small cell base stations, which will lower their spectral efficiency. In this paper, we propose a software‐defined networking (SDN‐based hierarchical agglomerative clustering (SDN‐HAC framework, which leverages SDN to centrally control all sub‐channels in the network, and decides on cluster merging using a similarity criterion based on a suitability function. We evaluate the proposed algorithm through simulation. The obtained results show that the proposed algorithm performs well and improves system payoff by 18.19% and 436.34% when compared with the traditional network architecture algorithms and non‐cooperative scenarios, respectively.

  13. An Algorithm for Inspecting Self Check-in Airline Luggage Based on Hierarchical Clustering and Cube-fitting

    Directory of Open Access Journals (Sweden)

    Gao Qingji

    2014-04-01

    Full Text Available Airport passengers are required to put only one baggage each time in the check-in self-service so that the baggage can be detected and identified successfully. In order to automatically get the number of baggage that had been put on the conveyor belt, dual laser rangefinders are used to scan the outer contour of luggage in this paper. The algorithm based on hierarchical clustering and cube-fitting is proposed to inspect the number and dimension of airline luggage. Firstly, the point cloud is projected to vertical direction. By the analysis of one-dimensional clustering, the number and height of luggage will be quickly computed. Secondly, the method of nearest hierarchical clustering is applied to divide the point cloud if the above cannot be distinguished. It can preferably solve the difficult issue like crossing or overlapping pieces of baggage. Finally, the point cloud is projected to the horizontal plane. By rotating point cloud based on the centre, its minimum bounding rectangle (MBR is obtained. The length and width of luggage are got form MBR. Many experiments in different cases have been done to verify the effectiveness of the algorithm.

  14. A Negative Selection Algorithm Based on Hierarchical Clustering of Self Set and its Application in Anomaly Detection

    Directory of Open Access Journals (Sweden)

    Wen Chen

    2011-08-01

    Full Text Available A negative selection algorithm based on the hierarchical clustering of self set HC-RNSA is introduced in this paper. Several strategies are applied to improve the algorithm performance. First, the self data set is replaced by the self cluster centers to compare with the detector candidates in each cluster level. As the number of self clusters is much less than the self set size, the detector generation efficiency is improved. Second, during the detector generation process, the detector candidates are restricted to the lower coverage space to reduce detector redundancy. In the article, the problem that the distances between antigens coverage to a constant value in the high dimensional space is analyzed, accordingly the Principle Component Analysis (PCA method is used to reduce the data dimension, and the fractional distance function is employed to enhance the distinctiveness between the self and non-self antigens. The detector generation procedure is terminated when the expected non-self coverage is reached. The theory analysis and experimental results demonstrate that the detection rate of HC-RNSA is higher than that of the traditional negative selection algorithms while the false alarm rate and time cost are reduced.

  15. Convex Clustering: An Attractive Alternative to Hierarchical Clustering

    Science.gov (United States)

    Chen, Gary K.; Chi, Eric C.; Ranola, John Michael O.; Lange, Kenneth

    2015-01-01

    The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical clustering visually appealing and in tune with their evolutionary perspective. Hierarchical clustering operates on multiple scales simultaneously. This is essential, for instance, in transcriptome data, where one may be interested in making qualitative inferences about how lower-order relationships like gene modules lead to higher-order relationships like pathways or biological processes. The recently developed method of convex clustering preserves the visual appeal of hierarchical clustering while ameliorating its propensity to make false inferences in the presence of outliers and noise. The solution paths generated by convex clustering reveal relationships between clusters that are hidden by static methods such as k-means clustering. The current paper derives and tests a novel proximal distance algorithm for minimizing the objective function of convex clustering. The algorithm separates parameters, accommodates missing data, and supports prior information on relationships. Our program CONVEXCLUSTER incorporating the algorithm is implemented on ATI and nVidia graphics processing units (GPUs) for maximal speed. Several biological examples illustrate the strengths of convex clustering and the ability of the proximal distance algorithm to handle high-dimensional problems. CONVEXCLUSTER can be freely downloaded from the UCLA Human Genetics web site at http://www.genetics.ucla.edu/software/ PMID:25965340

  16. Robust Pseudo-Hierarchical Support Vector Clustering

    DEFF Research Database (Denmark)

    Hansen, Michael Sass; Sjöstrand, Karl; Olafsdóttir, Hildur

    2007-01-01

    Support vector clustering (SVC) has proven an efficient algorithm for clustering of noisy and high-dimensional data sets, with applications within many fields of research. An inherent problem, however, has been setting the parameters of the SVC algorithm. Using the recent emergence of a method...... for calculating the entire regularization path of the support vector domain description, we propose a fast method for robust pseudo-hierarchical support vector clustering (HSVC). The method is demonstrated to work well on generated data, as well as for detecting ischemic segments from multidimensional myocardial...

  17. Statistical Significance for Hierarchical Clustering

    Science.gov (United States)

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  18. Hierarchical matrices algorithms and analysis

    CERN Document Server

    Hackbusch, Wolfgang

    2015-01-01

    This self-contained monograph presents matrix algorithms and their analysis. The new technique enables not only the solution of linear systems but also the approximation of matrix functions, e.g., the matrix exponential. Other applications include the solution of matrix equations, e.g., the Lyapunov or Riccati equation. The required mathematical background can be found in the appendix. The numerical treatment of fully populated large-scale matrices is usually rather costly. However, the technique of hierarchical matrices makes it possible to store matrices and to perform matrix operations approximately with almost linear cost and a controllable degree of approximation error. For important classes of matrices, the computational cost increases only logarithmically with the approximation error. The operations provided include the matrix inversion and LU decomposition. Since large-scale linear algebra problems are standard in scientific computing, the subject of hierarchical matrices is of interest to scientists ...

  19. Partitional clustering algorithms

    CERN Document Server

    2015-01-01

    This book summarizes the state-of-the-art in partitional clustering. Clustering, the unsupervised classification of patterns into groups, is one of the most important tasks in exploratory data analysis. Primary goals of clustering include gaining insight into, classifying, and compressing data. Clustering has a long and rich history that spans a variety of scientific disciplines including anthropology, biology, medicine, psychology, statistics, mathematics, engineering, and computer science. As a result, numerous clustering algorithms have been proposed since the early 1950s. Among these algorithms, partitional (nonhierarchical) ones have found many applications, especially in engineering and computer science. This book provides coverage of consensus clustering, constrained clustering, large scale and/or high dimensional clustering, cluster validity, cluster visualization, and applications of clustering. Examines clustering as it applies to large and/or high-dimensional data sets commonly encountered in reali...

  20. Fermion cluster algorithms

    International Nuclear Information System (INIS)

    Chandrasekharan, Shailesh

    2000-01-01

    Cluster algorithms have been recently used to eliminate sign problems that plague Monte-Carlo methods in a variety of systems. In particular such algorithms can also be used to solve sign problems associated with the permutation of fermion world lines. This solution leads to the possibility of designing fermion cluster algorithms in certain cases. Using the example of free non-relativistic fermions we discuss the ideas underlying the algorithm

  1. Hierarchical Aligned Cluster Analysis for Temporal Clustering of Human Motion.

    Science.gov (United States)

    Zhou, Feng; De la Torre, Fernando; Hodgins, Jessica K

    2013-03-01

    Temporal segmentation of human motion into plausible motion primitives is central to understanding and building computational models of human motion. Several issues contribute to the challenge of discovering motion primitives: the exponential nature of all possible movement combinations, the variability in the temporal scale of human actions, and the complexity of representing articulated motion. We pose the problem of learning motion primitives as one of temporal clustering, and derive an unsupervised hierarchical bottom-up framework called hierarchical aligned cluster analysis (HACA). HACA finds a partition of a given multidimensional time series into m disjoint segments such that each segment belongs to one of k clusters. HACA combines kernel k-means with the generalized dynamic time alignment kernel to cluster time series data. Moreover, it provides a natural framework to find a low-dimensional embedding for time series. HACA is efficiently optimized with a coordinate descent strategy and dynamic programming. Experimental results on motion capture and video data demonstrate the effectiveness of HACA for segmenting complex motions and as a visualization tool. We also compare the performance of HACA to state-of-the-art algorithms for temporal clustering on data of a honey bee dance. The HACA code is available online.

  2. Inferring hierarchical clustering structures by deterministic annealing

    International Nuclear Information System (INIS)

    Hofmann, T.; Buhmann, J.M.

    1996-01-01

    The unsupervised detection of hierarchical structures is a major topic in unsupervised learning and one of the key questions in data analysis and representation. We propose a novel algorithm for the problem of learning decision trees for data clustering and related problems. In contrast to many other methods based on successive tree growing and pruning, we propose an objective function for tree evaluation and we derive a non-greedy technique for tree growing. Applying the principles of maximum entropy and minimum cross entropy, a deterministic annealing algorithm is derived in a meanfield approximation. This technique allows us to canonically superimpose tree structures and to fit parameters to averaged or open-quote fuzzified close-quote trees

  3. Hierarchical video summarization based on context clustering

    Science.gov (United States)

    Tseng, Belle L.; Smith, John R.

    2003-11-01

    A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. Besides finding the desired contents, the objective is to present a coherent summary. There are diverse methods for creating summaries, and we focus on the challenges of generating a hierarchical video summary based on context information. In our summarization algorithm, three inputs are used to generate the hierarchical video summary output. These inputs are (1) MPEG-7 metadata descriptions of the contents in the server, (2) user preference and usage environment declarations from the user client, and (3) context information including MPEG-7 controlled term list and classification scheme. In a video sequence, descriptions and relevance scores are assigned to each shot. Based on these shot descriptions, context clustering is performed to collect consecutively similar shots to correspond to hierarchical scene representations. The context clustering is based on the available context information, and may be derived from domain knowledge or rules engines. Finally, the selection of structured video segments to generate the hierarchical summary efficiently balances between scene representation and shot selection.

  4. Robustness of Multiple Clustering Algorithms on Hyperspectral Images

    National Research Council Canada - National Science Library

    Williams, Jason P

    2007-01-01

    .... Various clustering algorithms were employed, including a hierarchical method, ISODATA, K-means, and X-means, and were used on a simple two dimensional dataset in order to discover potential problems with the algorithms...

  5. Data clustering theory, algorithms, and applications

    CERN Document Server

    Gan, Guojun; Wu, Jianhong

    2007-01-01

    Cluster analysis is an unsupervised process that divides a set of objects into homogeneous groups. This book starts with basic information on cluster analysis, including the classification of data and the corresponding similarity measures, followed by the presentation of over 50 clustering algorithms in groups according to some specific baseline methodologies such as hierarchical, center-based, and search-based methods. As a result, readers and users can easily identify an appropriate algorithm for their applications and compare novel ideas with existing results. The book also provides examples of clustering applications to illustrate the advantages and shortcomings of different clustering architectures and algorithms. Application areas include pattern recognition, artificial intelligence, information technology, image processing, biology, psychology, and marketing. Readers also learn how to perform cluster analysis with the C/C++ and MATLAB® programming languages.

  6. Hierarchical Control for Multiple DC Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    This paper presents a distributed hierarchical control framework to ensure reliable operation of dc Microgrid (MG) clusters. In this hierarchy, primary control is used to regulate the common bus voltage inside each MG locally. An adaptive droop method is proposed for this level which determines...

  7. A cluster algorithm for graphs

    NARCIS (Netherlands)

    S. van Dongen

    2000-01-01

    textabstractA cluster algorithm for graphs called the emph{Markov Cluster algorithm (MCL~algorithm) is introduced. The algorithm provides basically an interface to an algebraic process defined on stochastic matrices, called the MCL~process. The graphs may be both weighted (with nonnegative weight)

  8. A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

    Science.gov (United States)

    Chahine, Firas Safwan

    2012-01-01

    Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

  9. Energy Aware Clustering Algorithms for Wireless Sensor Networks

    Science.gov (United States)

    Rakhshan, Noushin; Rafsanjani, Marjan Kuchaki; Liu, Chenglian

    2011-09-01

    The sensor nodes deployed in wireless sensor networks (WSNs) are extremely power constrained, so maximizing the lifetime of the entire networks is mainly considered in the design. In wireless sensor networks, hierarchical network structures have the advantage of providing scalable and energy efficient solutions. In this paper, we investigate different clustering algorithms for WSNs and also compare these clustering algorithms based on metrics such as clustering distribution, cluster's load balancing, Cluster Head's (CH) selection strategy, CH's role rotation, node mobility, clusters overlapping, intra-cluster communications, reliability, security and location awareness.

  10. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  11. Technique for fast and efficient hierarchical clustering

    Science.gov (United States)

    Stork, Christopher

    2013-10-08

    A fast and efficient technique for hierarchical clustering of samples in a dataset includes compressing the dataset to reduce a number of variables within each of the samples of the dataset. A nearest neighbor matrix is generated to identify nearest neighbor pairs between the samples based on differences between the variables of the samples. The samples are arranged into a hierarchy that groups the samples based on the nearest neighbor matrix. The hierarchy is rendered to a display to graphically illustrate similarities or differences between the samples.

  12. Communication Base Station Log Analysis Based on Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Zhang Shao-Hua

    2017-01-01

    Full Text Available Communication base stations generate massive data every day, these base station logs play an important value in mining of the business circles. This paper use data mining technology and hierarchical clustering algorithm to group the scope of business circle for the base station by recording the data of these base stations.Through analyzing the data of different business circle based on feature extraction and comparing different business circle category characteristics, which can choose a suitable area for operators of commercial marketing.

  13. Star Cluster Structure from Hierarchical Star Formation

    Science.gov (United States)

    Grudic, Michael; Hopkins, Philip; Murray, Norman; Lamberts, Astrid; Guszejnov, David; Schmitz, Denise; Boylan-Kolchin, Michael

    2018-01-01

    Young massive star clusters (YMCs) spanning 104-108 M⊙ in mass generally have similar radial surface density profiles, with an outer power-law index typically between -2 and -3. This similarity suggests that they are shaped by scale-free physics at formation. Recent multi-physics MHD simulations of YMC formation have also produced populations of YMCs with this type of surface density profile, allowing us to narrow down the physics necessary to form a YMC with properties as observed. We show that the shallow density profiles of YMCs are a natural result of phase-space mixing that occurs as they assemble from the clumpy, hierarchically-clustered configuration imprinted by the star formation process. We develop physical intuition for this process via analytic arguments and collisionless N-body experiments, elucidating the connection between star formation physics and star cluster structure. This has implications for the early-time structure and evolution of proto-globular clusters, and prospects for simulating their formation in the FIRE cosmological zoom-in simulations.

  14. Data clustering algorithms and applications

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains.The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as fea

  15. Unsupervised active learning based on hierarchical graph-theoretic clustering.

    Science.gov (United States)

    Hu, Weiming; Hu, Wei; Xie, Nianhua; Maybank, Steve

    2009-10-01

    Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

  16. A Hierarchical Clustering Methodology for the Estimation of Toxicity

    Science.gov (United States)

    A Quantitative Structure Activity Relationship (QSAR) methodology based on hierarchical clustering was developed to predict toxicological endpoints. This methodology utilizes Ward's method to divide a training set into a series of structurally similar clusters. The structural sim...

  17. Cluster Based Hierarchical Routing Protocol for Wireless Sensor Network

    OpenAIRE

    Rashed, Md. Golam; Kabir, M. Hasnat; Rahim, Muhammad Sajjadur; Ullah, Shaikh Enayet

    2012-01-01

    The efficient use of energy source in a sensor node is most desirable criteria for prolong the life time of wireless sensor network. In this paper, we propose a two layer hierarchical routing protocol called Cluster Based Hierarchical Routing Protocol (CBHRP). We introduce a new concept called head-set, consists of one active cluster head and some other associate cluster heads within a cluster. The head-set members are responsible for control and management of the network. Results show that t...

  18. Hierarchical clustering using correlation metric and spatial continuity constraint

    Science.gov (United States)

    Stork, Christopher L.; Brewer, Luke N.

    2012-10-02

    Large data sets are analyzed by hierarchical clustering using correlation as a similarity measure. This provides results that are superior to those obtained using a Euclidean distance similarity measure. A spatial continuity constraint may be applied in hierarchical clustering analysis of images.

  19. Clinical fracture risk evaluated by hierarchical agglomerative clustering

    DEFF Research Database (Denmark)

    Kruse, C; Eiken, P; Vestergaard, P

    2017-01-01

    reimbursement, primary healthcare sector use and comorbidity of female subjects were combined. Standardized variable means, Euclidean distances and Ward's D2 method of hierarchical agglomerative clustering (HAC), were used to form the clustering object. K number of clusters was selected with the lowest cluster...

  20. Efficient Record Linkage Algorithms Using Complete Linkage Clustering.

    Science.gov (United States)

    Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar

    2016-01-01

    Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.

  1. K-means Clustering: Lloyd's algorithm

    Indian Academy of Sciences (India)

    First page Back Continue Last page Overview Graphics. K-means Clustering: Lloyd's algorithm. Refines clusters iteratively. Cluster points using Voronoi partitioning of the centers; Centroids of the clusters determine the new centers. Bad example k = 3, n =4.

  2. A generic algorithm for constructing hierarchical representations of geometric objects

    International Nuclear Information System (INIS)

    Xavier, P.G.

    1995-01-01

    For a number of years, robotics researchers have exploited hierarchical representations of geometrical objects and scenes in motion-planning, collision-avoidance, and simulation. However, few general techniques exist for automatically constructing them. We present a generic, bottom-up algorithm that uses a heuristic clustering technique to produced balanced, coherent hierarchies. Its worst-case running time is O(N 2 logN), but for non-pathological cases it is O(NlogN), where N is the number of input primitives. We have completed a preliminary C++ implementation for input collections of 3D convex polygons and 3D convex polyhedra and conducted simple experiments with scenes of up to 12,000 polygons, which take only a few minutes to process. We present examples using spheres and convex hulls as hierarchy primitives

  3. Prediction of Solvent Physical Properties using the Hierarchical Clustering Method

    Science.gov (United States)

    Recently a QSAR (Quantitative Structure Activity Relationship) method, the hierarchical clustering method, was developed to estimate acute toxicity values for large, diverse datasets. This methodology has now been applied to the estimate solvent physical properties including sur...

  4. Normalization based K means Clustering Algorithm

    OpenAIRE

    Virmani, Deepali; Taneja, Shweta; Malhotra, Geetika

    2015-01-01

    K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights. Experimental results prove the betterment of proposed N-K means clustering algorithm over existing...

  5. Parallel algorithms and cluster computing

    CERN Document Server

    Hoffmann, Karl Heinz

    2007-01-01

    This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.

  6. Scalable Hierarchical Algorithms for stochastic PDEs and UQ

    KAUST Repository

    Litvinenko, Alexander; Chá vez, Gustavo; Keyes,David; Ltaief, Hatem; Yokota, Rio

    2015-01-01

    number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered

  7. A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.

    Science.gov (United States)

    Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing

    2017-01-01

    An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.

  8. Determination of atomic cluster structure with cluster fusion algorithm

    DEFF Research Database (Denmark)

    Obolensky, Oleg I.; Solov'yov, Ilia; Solov'yov, Andrey V.

    2005-01-01

    We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters.......We report an efficient scheme of global optimization, called cluster fusion algorithm, which has proved its reliability and high efficiency in determination of the structure of various atomic clusters....

  9. Parallel Implementation of the Recursive Approximation of an Unsupervised Hierarchical Segmentation Algorithm. Chapter 5

    Science.gov (United States)

    Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)

    2008-01-01

    The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.

  10. Hierarchical modeling of cluster size in wildlife surveys

    Science.gov (United States)

    Royle, J. Andrew

    2008-01-01

    Clusters or groups of individuals are the fundamental unit of observation in many wildlife sampling problems, including aerial surveys of waterfowl, marine mammals, and ungulates. Explicit accounting of cluster size in models for estimating abundance is necessary because detection of individuals within clusters is not independent and detectability of clusters is likely to increase with cluster size. This induces a cluster size bias in which the average cluster size in the sample is larger than in the population at large. Thus, failure to account for the relationship between delectability and cluster size will tend to yield a positive bias in estimates of abundance or density. I describe a hierarchical modeling framework for accounting for cluster-size bias in animal sampling. The hierarchical model consists of models for the observation process conditional on the cluster size distribution and the cluster size distribution conditional on the total number of clusters. Optionally, a spatial model can be specified that describes variation in the total number of clusters per sample unit. Parameter estimation, model selection, and criticism may be carried out using conventional likelihood-based methods. An extension of the model is described for the situation where measurable covariates at the level of the sample unit are available. Several candidate models within the proposed class are evaluated for aerial survey data on mallard ducks (Anas platyrhynchos).

  11. A new cluster algorithm for graphs

    NARCIS (Netherlands)

    S. van Dongen

    1998-01-01

    textabstractA new cluster algorithm for graphs called the emph{Markov Cluster algorithm ($MCL$ algorithm) is introduced. The graphs may be both weighted (with nonnegative weight) and directed. Let~$G$~be such a graph. The $MCL$ algorithm simulates flow in $G$ by first identifying $G$ in a

  12. Text Clustering Algorithm Based on Random Cluster Core

    Directory of Open Access Journals (Sweden)

    Huang Long-Jun

    2016-01-01

    Full Text Available Nowadays clustering has become a popular text mining algorithm, but the huge data can put forward higher requirements for the accuracy and performance of text mining. In view of the performance bottleneck of traditional text clustering algorithm, this paper proposes a text clustering algorithm with random features. This is a kind of clustering algorithm based on text density, at the same time using the neighboring heuristic rules, the concept of random cluster is introduced, which effectively reduces the complexity of the distance calculation.

  13. Analysis of genetic association using hierarchical clustering and cluster validation indices.

    Science.gov (United States)

    Pagnuco, Inti A; Pastore, Juan I; Abras, Guillermo; Brun, Marcel; Ballarin, Virginia L

    2017-10-01

    It is usually assumed that co-expressed genes suggest co-regulation in the underlying regulatory network. Determining sets of co-expressed genes is an important task, based on some criteria of similarity. This task is usually performed by clustering algorithms, where the genes are clustered into meaningful groups based on their expression values in a set of experiment. In this work, we propose a method to find sets of co-expressed genes, based on cluster validation indices as a measure of similarity for individual gene groups, and a combination of variants of hierarchical clustering to generate the candidate groups. We evaluated its ability to retrieve significant sets on simulated correlated and real genomics data, where the performance is measured based on its detection ability of co-regulated sets against a full search. Additionally, we analyzed the quality of the best ranked groups using an online bioinformatics tool that provides network information for the selected genes. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. Which, When, and How: Hierarchical Clustering with Human–Machine Cooperation

    Directory of Open Access Journals (Sweden)

    Huanyang Zheng

    2016-12-01

    Full Text Available Human–Machine Cooperations (HMCs can balance the advantages and disadvantages of human computation (accurate but costly and machine computation (cheap but inaccurate. This paper studies HMCs in agglomerative hierarchical clusterings, where the machine can ask the human some questions. The human will return the answers to the machine, and the machine will use these answers to correct errors in its current clustering results. We are interested in the machine’s strategy on handling the question operations, in terms of three problems: (1 Which question should the machine ask? (2 When should the machine ask the question (early or late? (3 How does the machine adjust the clustering result, if the machine’s mistake is found by the human? Based on the insights of these problems, an efficient algorithm is proposed with five implementation variations. Experiments on image clusterings show that the proposed algorithm can improve the clustering accuracy with few question operations.

  15. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling

    Science.gov (United States)

    Rahman, Md. Habibur; Matin, M. A.; Salma, Umma

    2017-12-01

    The precipitation patterns of seventeen locations in Bangladesh from 1961 to 2014 were studied using a cluster analysis and metric multidimensional scaling. In doing so, the current research applies four major hierarchical clustering methods to precipitation in conjunction with different dissimilarity measures and metric multidimensional scaling. A variety of clustering algorithms were used to provide multiple clustering dendrograms for a mixture of distance measures. The dendrogram of pre-monsoon rainfall for the seventeen locations formed five clusters. The pre-monsoon precipitation data for the areas of Srimangal and Sylhet were located in two clusters across the combination of five dissimilarity measures and four hierarchical clustering algorithms. The single linkage algorithm with Euclidian and Manhattan distances, the average linkage algorithm with the Minkowski distance, and Ward's linkage algorithm provided similar results with regard to monsoon precipitation. The results of the post-monsoon and winter precipitation data are shown in different types of dendrograms with disparate combinations of sub-clusters. The schematic geometrical representations of the precipitation data using metric multidimensional scaling showed that the post-monsoon rainfall of Cox's Bazar was located far from those of the other locations. The results of a box-and-whisker plot, different clustering techniques, and metric multidimensional scaling indicated that the precipitation behaviour of Srimangal and Sylhet during the pre-monsoon season, Cox's Bazar and Sylhet during the monsoon season, Maijdi Court and Cox's Bazar during the post-monsoon season, and Cox's Bazar and Khulna during the winter differed from those at other locations in Bangladesh.

  16. Hierarchical clusters of phytoplankton variables in dammed water bodies

    Science.gov (United States)

    Silva, Eliana Costa e.; Lopes, Isabel Cristina; Correia, Aldina; Gonçalves, A. Manuela

    2017-06-01

    In this paper a dataset containing biological variables of the water column of several Portuguese reservoirs is analyzed. Hierarchical cluster analysis is used to obtain clusters of phytoplankton variables of the phylum Cyanophyta, with the objective of validating the classification of Portuguese reservoirs previewly presented in [1] which were divided into three clusters: (1) Interior Tagus and Aguieira; (2) Douro; and (3) Other rivers. Now three new clusters of Cyanophyta variables were found. Kruskal-Wallis and Mann-Whitney tests are used to compare the now obtained Cyanophyta clusters and the previous Reservoirs clusters, in order to validate the classification of the water quality of reservoirs. The amount of Cyanophyta algae present in the reservoirs from the three clusters is significantly different, which validates the previous classification.

  17. Frequent Pattern Mining Algorithms for Data Clustering

    DEFF Research Database (Denmark)

    Zimek, Arthur; Assent, Ira; Vreeken, Jilles

    2014-01-01

    that frequent pattern mining was at the cradle of subspace clustering—yet, it quickly developed into an independent research field. In this chapter, we discuss how frequent pattern mining algorithms have been extended and generalized towards the discovery of local clusters in high-dimensional data......Discovering clusters in subspaces, or subspace clustering and related clustering paradigms, is a research field where we find many frequent pattern mining related influences. In fact, as the first algorithms for subspace clustering were based on frequent pattern mining algorithms, it is fair to say....... In particular, we discuss several example algorithms for subspace clustering or projected clustering as well as point out recent research questions and open topics in this area relevant to researchers in either clustering or pattern mining...

  18. Scalable Hierarchical Algorithms for stochastic PDEs and Uncertainty Quantification

    KAUST Repository

    Litvinenko, Alexander; Chavez, Gustavo; Keyes, David E.; Ltaief, Hatem; Yokota, Rio

    2015-01-01

    number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by R

  19. Hierarchical Clustering of Large Databases and Classification of Antibiotics at High Noise Levels

    Directory of Open Access Journals (Sweden)

    Alexander V. Yarkov

    2008-12-01

    Full Text Available A new algorithm for divisive hierarchical clustering of chemical compounds based on 2D structural fragments is suggested. The algorithm is deterministic, and given a random ordering of the input, will always give the same clustering and can process a database up to 2 million records on a standard PC. The algorithm was used for classification of 1,183 antibiotics mixed with 999,994 random chemical structures. Similarity threshold, at which best separation of active and non active compounds took place, was estimated as 0.6. 85.7% of the antibiotics were successfully classified at this threshold with 0.4% of inaccurate compounds. A .sdf file was created with the probe molecules for clustering of external databases.

  20. A supplier selection using a hybrid grey based hierarchical clustering and artificial bee colony

    Directory of Open Access Journals (Sweden)

    Farshad Faezy Razi

    2014-06-01

    Full Text Available Selection of one or a combination of the most suitable potential providers and outsourcing problem is the most important strategies in logistics and supply chain management. In this paper, selection of an optimal combination of suppliers in inventory and supply chain management are studied and analyzed via multiple attribute decision making approach, data mining and evolutionary optimization algorithms. For supplier selection in supply chain, hierarchical clustering according to the studied indexes first clusters suppliers. Then, according to its cluster, each supplier is evaluated through Grey Relational Analysis. Then the combination of suppliers’ Pareto optimal rank and costs are obtained using Artificial Bee Colony meta-heuristic algorithm. A case study is conducted for a better description of a new algorithm to select a multiple source of suppliers.

  1. Study on distributed re-clustering algorithm for moblie wireless sensor networks

    Directory of Open Access Journals (Sweden)

    XU Chaojie

    2016-04-01

    Full Text Available In mobile wireless sensor networks,node mobility influences the topology of the hierarchically clustered network,thus affects packet delivery ratio and energy consumption of communications in clusters.To reduce the influence of node mobility,a distributed re-clustering algorithm is proposed in this paper.In this algorithm,basing on the clustered network,nodes estimate their current locations with particle algorithm and predict the most possible locations of next time basing on the mobility model.Each boundary node of a cluster periodically estimates the need for re-clustering and re-cluster itself to the optimal cluster through communicating with the cluster headers when needed.The simulation results indicate that,with small re-clustering periods,the proposed algorithm can be effective to keep appropriate communication distance and outperforms existing schemes on packet delivery ratio and energy consumption.

  2. Algorithm for Spatial Clustering with Obstacles

    OpenAIRE

    El-Sharkawi, Mohamed E.; El-Zawawy, Mohamed A.

    2009-01-01

    In this paper, we propose an efficient clustering technique to solve the problem of clustering in the presence of obstacles. The proposed algorithm divides the spatial area into rectangular cells. Each cell is associated with statistical information that enables us to label the cell as dense or non-dense. We also label each cell as obstructed (i.e. intersects any obstacle) or non-obstructed. Then the algorithm finds the regions (clusters) of connected, dense, non-obstructed cells. Finally, th...

  3. A roadmap of clustering algorithms: finding a match for a biomedical application.

    Science.gov (United States)

    Andreopoulos, Bill; An, Aijun; Wang, Xiaogang; Schroeder, Michael

    2009-05-01

    Clustering is ubiquitously applied in bioinformatics with hierarchical clustering and k-means partitioning being the most popular methods. Numerous improvements of these two clustering methods have been introduced, as well as completely different approaches such as grid-based, density-based and model-based clustering. For improved bioinformatics analysis of data, it is important to match clusterings to the requirements of a biomedical application. In this article, we present a set of desirable clustering features that are used as evaluation criteria for clustering algorithms. We review 40 different clustering algorithms of all approaches and datatypes. We compare algorithms on the basis of desirable clustering features, and outline algorithms' benefits and drawbacks as a basis for matching them to biomedical applications.

  4. Semantic based cluster content discovery in description first clustering algorithm

    International Nuclear Information System (INIS)

    Khan, M.W.; Asif, H.M.S.

    2017-01-01

    In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm) is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing); an IR (Information Retrieval) technique for induction of meaningful labels for clusters and VSM (Vector Space Model) for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase. (author)

  5. Algorithm of parallel: hierarchical transformation and its implementation on FPGA

    Science.gov (United States)

    Timchenko, Leonid I.; Petrovskiy, Mykola S.; Kokryatskay, Natalia I.; Barylo, Alexander S.; Dembitska, Sofia V.; Stepanikuk, Dmytro S.; Suleimenov, Batyrbek; Zyska, Tomasz; Uvaysova, Svetlana; Shedreyeva, Indira

    2017-08-01

    In this paper considers the algorithm of laser beam spots image classification in atmospheric-optical transmission systems. It discusses the need for images filtering using adaptive methods, using, for example, parallel-hierarchical networks. The article also highlights the need to create high-speed memory devices for such networks. Implementation and simulation results of the developed method based on the PLD are demonstrated, which shows that the presented method gives 15-20% better prediction results than similar methods.

  6. The structure of nearby clusters of galaxies Hierarchical clustering and an application to the Leo region

    CERN Document Server

    Materne, J

    1978-01-01

    A new method of classifying groups of galaxies, called hierarchical clustering, is presented as a tool for the investigation of nearby groups of galaxies. The method is free from model assumptions about the groups. The scaling of the different coordinates is necessary, and the level from which one accepts the groups as real has to be determined. Hierarchical clustering is applied to an unbiased sample of galaxies in the Leo region. Five distinct groups result which have reasonable physical properties, such as low crossing times and conservative mass-to-light ratios, and which follow a radial velocity- luminosity relation. Only 4 out of 39 galaxies were adopted as field galaxies. (27 refs).

  7. Non-convex polygons clustering algorithm

    Directory of Open Access Journals (Sweden)

    Kruglikov Alexey

    2016-01-01

    Full Text Available A clustering algorithm is proposed, to be used as a preliminary step in motion planning. It is tightly coupled to the applied problem statement, i.e. uses parameters meaningful only with respect to it. Use of geometrical properties for polygons clustering allows for a better calculation time as opposed to general-purpose algorithms. A special form of map optimized for quick motion planning is constructed as a result.

  8. Clustering-based classification of road traffic accidents using hierarchical clustering and artificial neural networks.

    Science.gov (United States)

    Taamneh, Madhar; Taamneh, Salah; Alkheder, Sharaf

    2017-09-01

    Artificial neural networks (ANNs) have been widely used in predicting the severity of road traffic crashes. All available information about previously occurred accidents is typically used for building a single prediction model (i.e., classifier). Too little attention has been paid to the differences between these accidents, leading, in most cases, to build less accurate predictors. Hierarchical clustering is a well-known clustering method that seeks to group data by creating a hierarchy of clusters. Using hierarchical clustering and ANNs, a clustering-based classification approach for predicting the injury severity of road traffic accidents was proposed. About 6000 road accidents occurred over a six-year period from 2008 to 2013 in Abu Dhabi were used throughout this study. In order to reduce the amount of variation in data, hierarchical clustering was applied on the data set to organize it into six different forms, each with different number of clusters (i.e., clusters from 1 to 6). Two ANN models were subsequently built for each cluster of accidents in each generated form. The first model was built and validated using all accidents (training set), whereas only 66% of the accidents were used to build the second model, and the remaining 34% were used to test it (percentage split). Finally, the weighted average accuracy was computed for each type of models in each from of data. The results show that when testing the models using the training set, clustering prior to classification achieves (11%-16%) more accuracy than without using clustering, while the percentage split achieves (2%-5%) more accuracy. The results also suggest that partitioning the accidents into six clusters achieves the best accuracy if both types of models are taken into account.

  9. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time.

    Science.gov (United States)

    Cai, Yunpeng; Sun, Yijun

    2011-08-01

    Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.

  10. Performance quantification of clustering algorithms for false positive removal in fMRI by ROC curves

    Directory of Open Access Journals (Sweden)

    André Salles Cunha Peres

    Full Text Available Abstract Introduction Functional magnetic resonance imaging (fMRI is a non-invasive technique that allows the detection of specific cerebral functions in humans based on hemodynamic changes. The contrast changes are about 5%, making visual inspection impossible. Thus, statistic strategies are applied to infer which brain region is engaged in a task. However, the traditional methods like general linear model and cross-correlation utilize voxel-wise calculation, introducing a lot of false-positive data. So, in this work we tested post-processing cluster algorithms to diminish the false-positives. Methods In this study, three clustering algorithms (the hierarchical cluster, k-means and self-organizing maps were tested and compared for false-positive removal in the post-processing of cross-correlation analyses. Results Our results showed that the hierarchical cluster presented the best performance to remove the false positives in fMRI, being 2.3 times more accurate than k-means, and 1.9 times more accurate than self-organizing maps. Conclusion The hierarchical cluster presented the best performance in false-positive removal because it uses the inconsistency coefficient threshold, while k-means and self-organizing maps utilize a priori cluster number (centroids and neurons number; thus, the hierarchical cluster avoids clustering scattered voxels, as the inconsistency coefficient threshold allows only the voxels to be clustered that are at a minimum distance to some cluster.

  11. A local adaptive algorithm for emerging scale-free hierarchical networks

    International Nuclear Information System (INIS)

    Gomez Portillo, I J; Gleiser, P M

    2010-01-01

    In this work we study a growing network model with chaotic dynamical units that evolves using a local adaptive rewiring algorithm. Using numerical simulations we show that the model allows for the emergence of hierarchical networks. First, we show that the networks that emerge with the algorithm present a wide degree distribution that can be fitted by a power law function, and thus are scale-free networks. Using the LaNet-vi visualization tool we present a graphical representation that reveals a central core formed only by hubs, and also show the presence of a preferential attachment mechanism. In order to present a quantitative analysis of the hierarchical structure we analyze the clustering coefficient. In particular, we show that as the network grows the clustering becomes independent of system size, and also presents a power law decay as a function of the degree. Finally, we compare our results with a similar version of the model that has continuous non-linear phase oscillators as dynamical units. The results show that local interactions play a fundamental role in the emergence of hierarchical networks.

  12. Hierarchical Adaptive Means (HAM) clustering for hardware-efficient, unsupervised and real-time spike sorting.

    Science.gov (United States)

    Paraskevopoulou, Sivylla E; Wu, Di; Eftekhar, Amir; Constandinou, Timothy G

    2014-09-30

    This work presents a novel unsupervised algorithm for real-time adaptive clustering of neural spike data (spike sorting). The proposed Hierarchical Adaptive Means (HAM) clustering method combines centroid-based clustering with hierarchical cluster connectivity to classify incoming spikes using groups of clusters. It is described how the proposed method can adaptively track the incoming spike data without requiring any past history, iteration or training and autonomously determines the number of spike classes. Its performance (classification accuracy) has been tested using multiple datasets (both simulated and recorded) achieving a near-identical accuracy compared to k-means (using 10-iterations and provided with the number of spike classes). Also, its robustness in applying to different feature extraction methods has been demonstrated by achieving classification accuracies above 80% across multiple datasets. Last but crucially, its low complexity, that has been quantified through both memory and computation requirements makes this method hugely attractive for future hardware implementation. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Robust MST-Based Clustering Algorithm.

    Science.gov (United States)

    Liu, Qidong; Zhang, Ruisheng; Zhao, Zhili; Wang, Zhenghai; Jiao, Mengyao; Wang, Guangjing

    2018-06-01

    Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.

  14. A hierarchical approach to reducing communication in parallel graph algorithms

    KAUST Repository

    Harshvardhan,

    2015-01-01

    Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each of the neighbors of a vertex. Our proposed approach recognizes this, and reduces communication performed by the algorithm without change to user-code, through a hierarchical machine model imposed upon the input graph. The hierarchical model takes advantage of locale information of the neighboring vertices to reduce communication, both in message volume and total number of bytes sent. It is also able to better exploit the machine hierarchy to further reduce the communication costs, by aggregating traffic between different levels of the machine hierarchy. Results of an implementation in the STAPL GL shows improved scalability and performance over the traditional level-synchronous approach, with 2.5 × - 8× improvement for a variety of graph algorithms at 12, 000+ cores.

  15. Intensity-based hierarchical clustering in CT-scans: application to interactive segmentation in cardiology

    Science.gov (United States)

    Hadida, Jonathan; Desrosiers, Christian; Duong, Luc

    2011-03-01

    The segmentation of anatomical structures in Computed Tomography Angiography (CTA) is a pre-operative task useful in image guided surgery. Even though very robust and precise methods have been developed to help achieving a reliable segmentation (level sets, active contours, etc), it remains very time consuming both in terms of manual interactions and in terms of computation time. The goal of this study is to present a fast method to find coarse anatomical structures in CTA with few parameters, based on hierarchical clustering. The algorithm is organized as follows: first, a fast non-parametric histogram clustering method is proposed to compute a piecewise constant mask. A second step then indexes all the space-connected regions in the piecewise constant mask. Finally, a hierarchical clustering is achieved to build a graph representing the connections between the various regions in the piecewise constant mask. This step builds up a structural knowledge about the image. Several interactive features for segmentation are presented, for instance association or disassociation of anatomical structures. A comparison with the Mean-Shift algorithm is presented.

  16. A data-driven approach to estimating the number of clusters in hierarchical clustering [version 1; referees: 2 approved, 1 approved with reservations

    Directory of Open Access Journals (Sweden)

    Antoine E. Zambelli

    2016-12-01

    Full Text Available DNA microarray and gene expression problems often require a researcher to perform clustering on their data in a bid to better understand its structure. In cases where the number of clusters is not known, one can resort to hierarchical clustering methods. However, there currently exist very few automated algorithms for determining the true number of clusters in the data. We propose two new methods (mode and maximum difference for estimating the number of clusters in a hierarchical clustering framework to create a fully automated process with no human intervention. These methods are compared to the established elbow and gap statistic algorithms using simulated datasets and the Biobase Gene ExpressionSet. We also explore a data mixing procedure inspired by cross validation techniques. We find that the overall performance of the maximum difference method is comparable or greater to that of the gap statistic in multi-cluster scenarios, and achieves that performance at a fraction of the computational cost. This method also responds well to our mixing procedure, which opens the door to future research. We conclude that both the mode and maximum difference methods warrant further study related to their mixing and cross-validation potential. We particularly recommend the use of the maximum difference method in multi-cluster scenarios given its accuracy and execution times, and present it as an alternative to existing algorithms.

  17. Assessment of surface water quality using hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Dheeraj Kumar Dabgerwal

    2016-02-01

    Full Text Available This study was carried out to assess the physicochemical quality river Varuna inVaranasi,India. Water samples were collected from 10 sites during January-June 2015. Pearson correlation analysis was used to assess the direction and strength of relationship between physicochemical parameters. Hierarchical Cluster analysis was also performed to determine the sources of pollution in the river Varuna. The result showed quite high value of DO, Nitrate, BOD, COD and Total Alkalinity, above the BIS permissible limit. The results of correlation analysis identified key water parameters as pH, electrical conductivity, total alkalinity and nitrate, which influence the concentration of other water parameters. Cluster analysis identified three major clusters of sampling sites out of total 10 sites, according to the similarity in water quality. This study illustrated the usefulness of correlation and cluster analysis for getting better information about the river water quality.International Journal of Environment Vol. 5 (1 2016,  pp: 32-44

  18. The Hierarchical Clustering of Tax Burden in the EU27

    Directory of Open Access Journals (Sweden)

    Simkova Nikola

    2015-09-01

    Full Text Available The issue of taxation has become more important due to a significant share of the government revenue. There are several ways of expressing the tax burden of countries. This paper describes the traditional approach as a share of tax revenue to GDP which is applied to the total taxation and the capital taxation as a part of tax systems affecting investment decisions. The implicit tax rate on capital created by Eurostat also offers a possible explanation of the tax burden on capital, so its components are analysed in detail. This study uses one of the econometric methods called the hierarchical clustering. The data on which the clustering is based comprises countries in the EU27 for the period of 1995 – 2012. The aim of this paper is to reveal clusters of countries in the EU27 with similar tax burden or tax changes. The findings suggest that mainly newly acceding countries (2004 and 2007 are in a group of countries with a low tax burden which tried to encourage investors by favourable tax rates. On the other hand, there are mostly countries from the original EU15. Some clusters may be explained by similar historical development, geographic and demographic characteristics.

  19. Hierarchical Bayesian modelling of gene expression time series across irregularly sampled replicates and clusters.

    Science.gov (United States)

    Hensman, James; Lawrence, Neil D; Rattray, Magnus

    2013-08-20

    Time course data from microarrays and high-throughput sequencing experiments require simple, computationally efficient and powerful statistical models to extract meaningful biological signal, and for tasks such as data fusion and clustering. Existing methodologies fail to capture either the temporal or replicated nature of the experiments, and often impose constraints on the data collection process, such as regularly spaced samples, or similar sampling schema across replications. We propose hierarchical Gaussian processes as a general model of gene expression time-series, with application to a variety of problems. In particular, we illustrate the method's capacity for missing data imputation, data fusion and clustering.The method can impute data which is missing both systematically and at random: in a hold-out test on real data, performance is significantly better than commonly used imputation methods. The method's ability to model inter- and intra-cluster variance leads to more biologically meaningful clusters. The approach removes the necessity for evenly spaced samples, an advantage illustrated on a developmental Drosophila dataset with irregular replications. The hierarchical Gaussian process model provides an excellent statistical basis for several gene-expression time-series tasks. It has only a few additional parameters over a regular GP, has negligible additional complexity, is easily implemented and can be integrated into several existing algorithms. Our experiments were implemented in python, and are available from the authors' website: http://staffwww.dcs.shef.ac.uk/people/J.Hensman/.

  20. Maximum-entropy clustering algorithm and its global convergence analysis

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    Constructing a batch of differentiable entropy functions touniformly approximate an objective function by means of the maximum-entropy principle, a new clustering algorithm, called maximum-entropy clustering algorithm, is proposed based on optimization theory. This algorithm is a soft generalization of the hard C-means algorithm and possesses global convergence. Its relations with other clustering algorithms are discussed.

  1. Synchronous Firefly Algorithm for Cluster Head Selection in WSN

    Directory of Open Access Journals (Sweden)

    Madhusudhanan Baskaran

    2015-01-01

    Full Text Available Wireless Sensor Network (WSN consists of small low-cost, low-power multifunctional nodes interconnected to efficiently aggregate and transmit data to sink. Cluster-based approaches use some nodes as Cluster Heads (CHs and organize WSNs efficiently for aggregation of data and energy saving. A CH conveys information gathered by cluster nodes and aggregates/compresses data before transmitting it to a sink. However, this additional responsibility of the node results in a higher energy drain leading to uneven network degradation. Low Energy Adaptive Clustering Hierarchy (LEACH offsets this by probabilistically rotating cluster heads role among nodes with energy above a set threshold. CH selection in WSN is NP-Hard as optimal data aggregation with efficient energy savings cannot be solved in polynomial time. In this work, a modified firefly heuristic, synchronous firefly algorithm, is proposed to improve the network performance. Extensive simulation shows the proposed technique to perform well compared to LEACH and energy-efficient hierarchical clustering. Simulations show the effectiveness of the proposed method in decreasing the packet loss ratio by an average of 9.63% and improving the energy efficiency of the network when compared to LEACH and EEHC.

  2. CHIMERA: Top-down model for hierarchical, overlapping and directed cluster structures in directed and weighted complex networks

    Science.gov (United States)

    Franke, R.

    2016-11-01

    In many networks discovered in biology, medicine, neuroscience and other disciplines special properties like a certain degree distribution and hierarchical cluster structure (also called communities) can be observed as general organizing principles. Detecting the cluster structure of an unknown network promises to identify functional subdivisions, hierarchy and interactions on a mesoscale. It is not trivial choosing an appropriate detection algorithm because there are multiple network, cluster and algorithmic properties to be considered. Edges can be weighted and/or directed, clusters overlap or build a hierarchy in several ways. Algorithms differ not only in runtime, memory requirements but also in allowed network and cluster properties. They are based on a specific definition of what a cluster is, too. On the one hand, a comprehensive network creation model is needed to build a large variety of benchmark networks with different reasonable structures to compare algorithms. On the other hand, if a cluster structure is already known, it is desirable to separate effects of this structure from other network properties. This can be done with null model networks that mimic an observed cluster structure to improve statistics on other network features. A third important application is the general study of properties in networks with different cluster structures, possibly evolving over time. Currently there are good benchmark and creation models available. But what is left is a precise sandbox model to build hierarchical, overlapping and directed clusters for undirected or directed, binary or weighted complex random networks on basis of a sophisticated blueprint. This gap shall be closed by the model CHIMERA (Cluster Hierarchy Interconnection Model for Evaluation, Research and Analysis) which will be introduced and described here for the first time.

  3. Constructing a graph of connections in clustering algorithm of complex objects

    Directory of Open Access Journals (Sweden)

    Татьяна Шатовская

    2015-05-01

    Full Text Available The article describes the results of modifying the algorithm Chameleon. Hierarchical multi-level algorithm consists of several phases: the construction of the count, coarsening, the separation and recovery. Each phase can be used various approaches and algorithms. The main aim of the work is to study the quality of the clustering of different sets of data using a set of algorithms combinations at different stages of the algorithm and improve the stage of construction by the optimization algorithm of k choice in the graph construction of k of nearest neighbors

  4. An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints

    OpenAIRE

    Yunqing Rao; Dezhong Qi; Jinling Li

    2013-01-01

    For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm) is developed for better ...

  5. clusterMaker: a multi-algorithm clustering plugin for Cytoscape

    Directory of Open Access Journals (Sweden)

    Morris John H

    2011-11-01

    Full Text Available Abstract Background In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view, k-means, k-medoid, SCPS, AutoSOME, and native (Java MCL. Results Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section. Conclusions The Cytoscape plugin cluster

  6. A Hybrid Fuzzy Multi-hop Unequal Clustering Algorithm for Dense Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Shawkat K. Guirguis

    2017-01-01

    Full Text Available Clustering is carried out to explore and solve power dissipation problem in wireless sensor network (WSN. Hierarchical network architecture, based on clustering, can reduce energy consumption, balance traffic load, improve scalability, and prolong network lifetime. However, clustering faces two main challenges: hotspot problem and searching for effective techniques to perform clustering. This paper introduces a fuzzy unequal clustering technique for heterogeneous dense WSNs to determine both final cluster heads and their radii. Proposed fuzzy system blends three effective parameters together which are: the distance to the base station, the density of the cluster, and the deviation of the noders residual energy from the average network energy. Our objectives are achieving gain for network lifetime, energy distribution, and energy consumption. To evaluate the proposed algorithm, WSN clustering based routing algorithms are analyzed, simulated, and compared with obtained results. These protocols are LEACH, SEP, HEED, EEUC, and MOFCA.

  7. Hierarchical Artificial Bee Colony Algorithm for RFID Network Planning Optimization

    Directory of Open Access Journals (Sweden)

    Lianbo Ma

    2014-01-01

    Full Text Available This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness.

  8. Hierarchical artificial bee colony algorithm for RFID network planning optimization.

    Science.gov (United States)

    Ma, Lianbo; Chen, Hanning; Hu, Kunyuan; Zhu, Yunlong

    2014-01-01

    This paper presents a novel optimization algorithm, namely, hierarchical artificial bee colony optimization, called HABC, to tackle the radio frequency identification network planning (RNP) problem. In the proposed multilevel model, the higher-level species can be aggregated by the subpopulations from lower level. In the bottom level, each subpopulation employing the canonical ABC method searches the part-dimensional optimum in parallel, which can be constructed into a complete solution for the upper level. At the same time, the comprehensive learning method with crossover and mutation operators is applied to enhance the global search ability between species. Experiments are conducted on a set of 10 benchmark optimization problems. The results demonstrate that the proposed HABC obtains remarkable performance on most chosen benchmark functions when compared to several successful swarm intelligence and evolutionary algorithms. Then HABC is used for solving the real-world RNP problem on two instances with different scales. Simulation results show that the proposed algorithm is superior for solving RNP, in terms of optimization accuracy and computation robustness.

  9. Fuzzy Rules for Ant Based Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Amira Hamdi

    2016-01-01

    Full Text Available This paper provides a new intelligent technique for semisupervised data clustering problem that combines the Ant System (AS algorithm with the fuzzy c-means (FCM clustering algorithm. Our proposed approach, called F-ASClass algorithm, is a distributed algorithm inspired by foraging behavior observed in ant colonyT. The ability of ants to find the shortest path forms the basis of our proposed approach. In the first step, several colonies of cooperating entities, called artificial ants, are used to find shortest paths in a complete graph that we called graph-data. The number of colonies used in F-ASClass is equal to the number of clusters in dataset. Hence, the partition matrix of dataset founded by artificial ants is given in the second step, to the fuzzy c-means technique in order to assign unclassified objects generated in the first step. The proposed approach is tested on artificial and real datasets, and its performance is compared with those of K-means, K-medoid, and FCM algorithms. Experimental section shows that F-ASClass performs better according to the error rate classification, accuracy, and separation index.

  10. Scalable Hierarchical Algorithms for stochastic PDEs and UQ

    KAUST Repository

    Litvinenko, Alexander

    2015-01-07

    H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by Kriemann [1,2]. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].

  11. Scalable Hierarchical Algorithms for stochastic PDEs and Uncertainty Quantification

    KAUST Repository

    Litvinenko, Alexander

    2015-01-05

    H-matrices and Fast Multipole (FMM) are powerful methods to approximate linear operators coming from partial differential and integral equations as well as speed up computational cost from quadratic or cubic to log-linear (O(n log n)), where n number of degrees of freedom in the discretization. The storage is reduced to the log-linear as well. This hierarchical structure is a good starting point for parallel algorithms. Parallelization on shared and distributed memory systems was pioneered by R. Kriemann, 2005. Since 2005, the area of parallel architectures and software is developing very fast. Progress in GPUs and Many-Core Systems (e.g. XeonPhi with 64 cores) motivated us to extend work started in [1,2,7,8].

  12. Genetic algorithm optimization of atomic clusters

    International Nuclear Information System (INIS)

    Morris, J.R.; Deaven, D.M.; Ho, K.M.; Wang, C.Z.; Pan, B.C.; Wacker, J.G.; Turner, D.E.; Iowa State Univ., Ames, IA

    1996-01-01

    The authors have been using genetic algorithms to study the structures of atomic clusters and related problems. This is a problem where local minima are easy to locate, but barriers between the many minima are large, and the number of minima prohibit a systematic search. They use a novel mating algorithm that preserves some of the geometrical relationship between atoms, in order to ensure that the resultant structures are likely to inherit the best features of the parent clusters. Using this approach, they have been able to find lower energy structures than had been previously obtained. Most recently, they have been able to turn around the building block idea, using optimized structures from the GA to learn about systematic structural trends. They believe that an effective GA can help provide such heuristic information, and (conversely) that such information can be introduced back into the algorithm to assist in the search process

  13. Radar Emission Sources Identification Based on Hierarchical Agglomerative Clustering for Large Data Sets

    Directory of Open Access Journals (Sweden)

    Janusz Dudczyk

    2016-01-01

    Full Text Available More advanced recognition methods, which may recognize particular copies of radars of the same type, are called identification. The identification process of radar devices is a more specialized task which requires methods based on the analysis of distinctive features. These features are distinguished from the signals coming from the identified devices. Such a process is called Specific Emitter Identification (SEI. The identification of radar emission sources with the use of classic techniques based on the statistical analysis of basic measurable parameters of a signal such as Radio Frequency, Amplitude, Pulse Width, or Pulse Repetition Interval is not sufficient for SEI problems. This paper presents the method of hierarchical data clustering which is used in the process of radar identification. The Hierarchical Agglomerative Clustering Algorithm (HACA based on Generalized Agglomerative Scheme (GAS implemented and used in the research method is parameterized; therefore, it is possible to compare the results. The results of clustering are presented in dendrograms in this paper. The received results of grouping and identification based on HACA are compared with other SEI methods in order to assess the degree of their usefulness and effectiveness for systems of ESM/ELINT class.

  14. The reflection of hierarchical cluster analysis of co-occurrence matrices in SPSS

    NARCIS (Netherlands)

    Zhou, Q.; Leng, F.; Leydesdorff, L.

    2015-01-01

    Purpose: To discuss the problems arising from hierarchical cluster analysis of co-occurrence matrices in SPSS, and the corresponding solutions. Design/methodology/approach: We design different methods of using the SPSS hierarchical clustering module for co-occurrence matrices in order to compare

  15. The experimental results on the quality of clustering diverse set of data using a modified algorithm chameleon

    Directory of Open Access Journals (Sweden)

    Татьяна Борисовна Шатовская

    2015-03-01

    Full Text Available In this work results of modified Chameleon algorithm are discussed. Hierarchical multilevel algorithms consist of several stages: building the graph, coarsening, partitioning, recovering. Exploring of clustering quality for different data sets with different combinations of algorithms on different stages of the algorithm is the main aim of the article. And also aim is improving the construction phase through the optimization algorithm of choice k in the building the graph k-nearest neighbors

  16. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  17. Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

    NARCIS (Netherlands)

    Odong, T.L.; Heerwaarden, van J.; Jansen, J.; Hintum, van T.J.L.; Eeuwijk, van F.A.

    2011-01-01

    Despite the availability of newer approaches, traditional hierarchical clustering remains very popular in genetic diversity studies in plants. However, little is known about its suitability for molecular marker data. We studied the performance of traditional hierarchical clustering techniques using

  18. Application for Suggesting Restaurants Using Clustering Algorithms

    Directory of Open Access Journals (Sweden)

    Iulia Alexandra IANCU

    2014-10-01

    Full Text Available The aim of this article is to present an application whose purpose is to make suggestions of restaurants to users. The application uses as input the descriptions of restaurants, reviews, user reviews available on the specialized Internet sites and blogs. In the application there are used processing techniques of natural language implemented using parsers, clustering algorithms and techniques for data collection from the Internet through web crawlers.

  19. DATA CLASSIFICATION WITH NEURAL CLASSIFIER USING RADIAL BASIS FUNCTION WITH DATA REDUCTION USING HIERARCHICAL CLUSTERING

    Directory of Open Access Journals (Sweden)

    M. Safish Mary

    2012-04-01

    Full Text Available Classification of large amount of data is a time consuming process but crucial for analysis and decision making. Radial Basis Function networks are widely used for classification and regression analysis. In this paper, we have studied the performance of RBF neural networks to classify the sales of cars based on the demand, using kernel density estimation algorithm which produces classification accuracy comparable to data classification accuracy provided by support vector machines. In this paper, we have proposed a new instance based data selection method where redundant instances are removed with help of a threshold thus improving the time complexity with improved classification accuracy. The instance based selection of the data set will help reduce the number of clusters formed thereby reduces the number of centers considered for building the RBF network. Further the efficiency of the training is improved by applying a hierarchical clustering technique to reduce the number of clusters formed at every step. The paper explains the algorithm used for classification and for conditioning the data. It also explains the complexities involved in classification of sales data for analysis and decision-making.

  20. Merging K-means with hierarchical clustering for identifying general-shaped groups.

    Science.gov (United States)

    Peterson, Anna D; Ghosh, Arka P; Maitra, Ranjan

    2018-01-01

    Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and K -means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while K -means clustering is efficient but designed to identify homogeneous spherically-shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using K -means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets.

  1. MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization

    Directory of Open Access Journals (Sweden)

    Kellermann Walter

    2007-01-01

    Full Text Available We address the problem of underdetermined BSS. While most previous approaches are designed for instantaneous mixtures, we propose a time-frequency-domain algorithm for convolutive mixtures. We adopt a two-step method based on a general maximum a posteriori (MAP approach. In the first step, we estimate the mixing matrix based on hierarchical clustering, assuming that the source signals are sufficiently sparse. The algorithm works directly on the complex-valued data in the time-frequency domain and shows better convergence than algorithms based on self-organizing maps. The assumption of Laplacian priors for the source signals in the second step leads to an algorithm for estimating the source signals. It involves the -norm minimization of complex numbers because of the use of the time-frequency-domain approach. We compare a combinatorial approach initially designed for real numbers with a second-order cone programming (SOCP approach designed for complex numbers. We found that although the former approach is not theoretically justified for complex numbers, its results are comparable to, or even better than, the SOCP solution. The advantage is a lower computational cost for problems with low input/output dimensions.

  2. D Nearest Neighbour Search Using a Clustered Hierarchical Tree Structure

    Science.gov (United States)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2016-06-01

    Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D) method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN) analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.

  3. A cluster algorithm for jet studies

    International Nuclear Information System (INIS)

    Daum, H.J.; Meyer, H.; Buerger, J.

    1980-10-01

    A procedure is described which determines the number of jets in hadronic final states by means of a cluster algorithm. In addition it yields a measurement of the energy and the direction of each jet. The properties of this method are studied using Monte Carlo simulations of different types of e + e - -annihilation final states. It is shown that in case of 3-jet events direct comparison with the underlying parton structure can be made. Possible further applications of this method are discussed. (orig.)

  4. Weighted Clustering

    DEFF Research Database (Denmark)

    Ackerman, Margareta; Ben-David, Shai; Branzei, Simina

    2012-01-01

    We investigate a natural generalization of the classical clustering problem, considering clustering tasks in which different instances may have different weights.We conduct the first extensive theoretical analysis on the influence of weighted data on standard clustering algorithms in both...... the partitional and hierarchical settings, characterizing the conditions under which algorithms react to weights. Extending a recent framework for clustering algorithm selection, we propose intuitive properties that would allow users to choose between clustering algorithms in the weighted setting and classify...

  5. Improved Ant Colony Clustering Algorithm and Its Performance Study

    Science.gov (United States)

    Gao, Wei

    2016-01-01

    Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533

  6. A test sheet generating algorithm based on intelligent genetic algorithm and hierarchical planning

    Science.gov (United States)

    Gu, Peipei; Niu, Zhendong; Chen, Xuting; Chen, Wei

    2013-03-01

    In recent years, computer-based testing has become an effective method to evaluate students' overall learning progress so that appropriate guiding strategies can be recommended. Research has been done to develop intelligent test assembling systems which can automatically generate test sheets based on given parameters of test items. A good multisubject test sheet depends on not only the quality of the test items but also the construction of the sheet. Effective and efficient construction of test sheets according to multiple subjects and criteria is a challenging problem. In this paper, a multi-subject test sheet generation problem is formulated and a test sheet generating approach based on intelligent genetic algorithm and hierarchical planning (GAHP) is proposed to tackle this problem. The proposed approach utilizes hierarchical planning to simplify the multi-subject testing problem and adopts genetic algorithm to process the layered criteria, enabling the construction of good test sheets according to multiple test item requirements. Experiments are conducted and the results show that the proposed approach is capable of effectively generating multi-subject test sheets that meet specified requirements and achieve good performance.

  7. A Dynamic Construction Algorithm for the Compact Patricia Trie Using the Hierarchical Structure.

    Science.gov (United States)

    Jung, Minsoo; Shishibori, Masami; Tanaka, Yasuhiro; Aoe, Jun-ichi

    2002-01-01

    Discussion of information retrieval focuses on the use of binary trees and how to compact it to use less memory and take less time. Explains retrieval algorithms and describes data structure and hierarchical structure. (LRW)

  8. An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints

    Directory of Open Access Journals (Sweden)

    Yunqing Rao

    2013-01-01

    Full Text Available For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.

  9. An improved hierarchical genetic algorithm for sheet cutting scheduling with process constraints.

    Science.gov (United States)

    Rao, Yunqing; Qi, Dezhong; Li, Jinling

    2013-01-01

    For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony--hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.

  10. Multi-objective hierarchical genetic algorithms for multilevel redundancy allocation optimization

    Energy Technology Data Exchange (ETDEWEB)

    Kumar, Ranjan [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: ranjan.k@ks3.ecs.kyoto-u.ac.jp; Izui, Kazuhiro [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: izui@prec.kyoto-u.ac.jp; Yoshimura, Masataka [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: yoshimura@prec.kyoto-u.ac.jp; Nishiwaki, Shinji [Department of Aeronautics and Astronautics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501 (Japan)], E-mail: shinji@prec.kyoto-u.ac.jp

    2009-04-15

    Multilevel redundancy allocation optimization problems (MRAOPs) occur frequently when attempting to maximize the system reliability of a hierarchical system, and almost all complex engineering systems are hierarchical. Despite their practical significance, limited research has been done concerning the solving of simple MRAOPs. These problems are not only NP hard but also involve hierarchical design variables. Genetic algorithms (GAs) have been applied in solving MRAOPs, since they are computationally efficient in solving such problems, unlike exact methods, but their applications has been confined to single-objective formulation of MRAOPs. This paper proposes a multi-objective formulation of MRAOPs and a methodology for solving such problems. In this methodology, a hierarchical GA framework for multi-objective optimization is proposed by introducing hierarchical genotype encoding for design variables. In addition, we implement the proposed approach by integrating the hierarchical genotype encoding scheme with two popular multi-objective genetic algorithms (MOGAs)-the strength Pareto evolutionary genetic algorithm (SPEA2) and the non-dominated sorting genetic algorithm (NSGA-II). In the provided numerical examples, the proposed multi-objective hierarchical approach is applied to solve two hierarchical MRAOPs, a 4- and a 3-level problems. The proposed method is compared with a single-objective optimization method that uses a hierarchical genetic algorithm (HGA), also applied to solve the 3- and 4-level problems. The results show that a multi-objective hierarchical GA (MOHGA) that includes elitism and mechanism for diversity preserving performed better than a single-objective GA that only uses elitism, when solving large-scale MRAOPs. Additionally, the experimental results show that the proposed method with NSGA-II outperformed the proposed method with SPEA2 in finding useful Pareto optimal solution sets.

  11. Multi-objective hierarchical genetic algorithms for multilevel redundancy allocation optimization

    International Nuclear Information System (INIS)

    Kumar, Ranjan; Izui, Kazuhiro; Yoshimura, Masataka; Nishiwaki, Shinji

    2009-01-01

    Multilevel redundancy allocation optimization problems (MRAOPs) occur frequently when attempting to maximize the system reliability of a hierarchical system, and almost all complex engineering systems are hierarchical. Despite their practical significance, limited research has been done concerning the solving of simple MRAOPs. These problems are not only NP hard but also involve hierarchical design variables. Genetic algorithms (GAs) have been applied in solving MRAOPs, since they are computationally efficient in solving such problems, unlike exact methods, but their applications has been confined to single-objective formulation of MRAOPs. This paper proposes a multi-objective formulation of MRAOPs and a methodology for solving such problems. In this methodology, a hierarchical GA framework for multi-objective optimization is proposed by introducing hierarchical genotype encoding for design variables. In addition, we implement the proposed approach by integrating the hierarchical genotype encoding scheme with two popular multi-objective genetic algorithms (MOGAs)-the strength Pareto evolutionary genetic algorithm (SPEA2) and the non-dominated sorting genetic algorithm (NSGA-II). In the provided numerical examples, the proposed multi-objective hierarchical approach is applied to solve two hierarchical MRAOPs, a 4- and a 3-level problems. The proposed method is compared with a single-objective optimization method that uses a hierarchical genetic algorithm (HGA), also applied to solve the 3- and 4-level problems. The results show that a multi-objective hierarchical GA (MOHGA) that includes elitism and mechanism for diversity preserving performed better than a single-objective GA that only uses elitism, when solving large-scale MRAOPs. Additionally, the experimental results show that the proposed method with NSGA-II outperformed the proposed method with SPEA2 in finding useful Pareto optimal solution sets

  12. Multi-documents summarization based on clustering of learning object using hierarchical clustering

    Science.gov (United States)

    Mustamiin, M.; Budi, I.; Santoso, H. B.

    2018-03-01

    The Open Educational Resources (OER) is a portal of teaching, learning and research resources that is available in public domain and freely accessible. Learning contents or Learning Objects (LO) are granular and can be reused for constructing new learning materials. LO ontology-based searching techniques can be used to search for LO in the Indonesia OER. In this research, LO from search results are used as an ingredient to create new learning materials according to the topic searched by users. Summarizing-based grouping of LO use Hierarchical Agglomerative Clustering (HAC) with the dependency context to the user’s query which has an average value F-Measure of 0.487, while summarizing by K-Means F-Measure only has an average value of 0.336.

  13. The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

    Energy Technology Data Exchange (ETDEWEB)

    Grasha, K.; Calzetti, D. [Astronomy Department, University of Massachusetts, Amherst, MA 01003 (United States); Adamo, A.; Messa, M. [Dept. of Astronomy, The Oskar Klein Centre, Stockholm University, Stockholm (Sweden); Kim, H. [Gemini Observatory, La Serena (Chile); Elmegreen, B. G. [IBM Research Division, T.J. Watson Research Center, Yorktown Hts., NY (United States); Gouliermis, D. A. [Zentrum für Astronomie der Universität Heidelberg, Institut für Theoretische Astrophysik, Albert-Ueberle-Str. 2, D-69120 Heidelberg (Germany); Dale, D. A. [Dept. of Physics and Astronomy, University of Wyoming, Laramie, WY (United States); Fumagalli, M. [Institute for Computational Cosmology and Centre for Extragalactic Astronomy, Durham University, Durham (United Kingdom); Grebel, E. K.; Shabani, F. [Astronomisches Rechen-Institut, Zentrum für Astronomie der Universität Heidelberg, Mönchhofstr. 12-14, D-69120 Heidelberg (Germany); Johnson, K. E. [Dept. of Astronomy, University of Virginia, Charlottesville, VA (United States); Kahre, L. [Dept. of Astronomy, New Mexico State University, Las Cruces, NM (United States); Kennicutt, R. C. [Institute of Astronomy, University of Cambridge, Cambridge (United Kingdom); Pellerin, A. [Dept. of Physics and Astronomy, State University of New York at Geneseo, Geneseo NY (United States); Ryon, J. E.; Ubeda, L. [Space Telescope Science Institute, Baltimore, MD (United States); Smith, L. J. [European Space Agency/Space Telescope Science Institute, Baltimore, MD (United States); Thilker, D., E-mail: kgrasha@astro.umass.edu [Dept. of Physics and Astronomy, The Johns Hopkins University, Baltimore, MD (United States)

    2017-05-10

    We present a study of the hierarchical clustering of the young stellar clusters in six local (3–15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ∼40–60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.

  14. The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

    Science.gov (United States)

    Grasha, K.; Calzetti, D.; Adamo, A.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Dale, D. A.; Fumagalli, M.; Grebel, E. K.; Johnson, K. E.; Kahre, L.; Kennicutt, R. C.; Messa, M.; Pellerin, A.; Ryon, J. E.; Smith, L. J.; Shabani, F.; Thilker, D.; Ubeda, L.

    2017-05-01

    We present a study of the hierarchical clustering of the young stellar clusters in six local (3-15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ˜40-60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.

  15. The Hierarchical Distribution of the Young Stellar Clusters in Six Local Star-forming Galaxies

    International Nuclear Information System (INIS)

    Grasha, K.; Calzetti, D.; Adamo, A.; Messa, M.; Kim, H.; Elmegreen, B. G.; Gouliermis, D. A.; Dale, D. A.; Fumagalli, M.; Grebel, E. K.; Shabani, F.; Johnson, K. E.; Kahre, L.; Kennicutt, R. C.; Pellerin, A.; Ryon, J. E.; Ubeda, L.; Smith, L. J.; Thilker, D.

    2017-01-01

    We present a study of the hierarchical clustering of the young stellar clusters in six local (3–15 Mpc) star-forming galaxies using Hubble Space Telescope broadband WFC3/UVIS UV and optical images from the Treasury Program LEGUS (Legacy ExtraGalactic UV Survey). We identified 3685 likely clusters and associations, each visually classified by their morphology, and we use the angular two-point correlation function to study the clustering of these stellar systems. We find that the spatial distribution of the young clusters and associations are clustered with respect to each other, forming large, unbound hierarchical star-forming complexes that are in general very young. The strength of the clustering decreases with increasing age of the star clusters and stellar associations, becoming more homogeneously distributed after ∼40–60 Myr and on scales larger than a few hundred parsecs. In all galaxies, the associations exhibit a global behavior that is distinct and more strongly correlated from compact clusters. Thus, populations of clusters are more evolved than associations in terms of their spatial distribution, traveling significantly from their birth site within a few tens of Myr, whereas associations show evidence of disruption occurring very quickly after their formation. The clustering of the stellar systems resembles that of a turbulent interstellar medium that drives the star formation process, correlating the components in unbound star-forming complexes in a hierarchical manner, dispersing shortly after formation, suggestive of a single, continuous mode of star formation across all galaxies.

  16. A Novel Clustering Algorithm Inspired by Membrane Computing

    Directory of Open Access Journals (Sweden)

    Hong Peng

    2015-01-01

    Full Text Available P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive to k-means algorithm and several evolutionary clustering algorithms recently reported in the literature.

  17. Similarity maps and hierarchical clustering for annotating FT-IR spectral images.

    Science.gov (United States)

    Zhong, Qiaoyong; Yang, Chen; Großerüschkamp, Frederik; Kallenbach-Thieltges, Angela; Serocka, Peter; Gerwert, Klaus; Mosig, Axel

    2013-11-20

    Unsupervised segmentation of multi-spectral images plays an important role in annotating infrared microscopic images and is an essential step in label-free spectral histopathology. In this context, diverse clustering approaches have been utilized and evaluated in order to achieve segmentations of Fourier Transform Infrared (FT-IR) microscopic images that agree with histopathological characterization. We introduce so-called interactive similarity maps as an alternative annotation strategy for annotating infrared microscopic images. We demonstrate that segmentations obtained from interactive similarity maps lead to similarly accurate segmentations as segmentations obtained from conventionally used hierarchical clustering approaches. In order to perform this comparison on quantitative grounds, we provide a scheme that allows to identify non-horizontal cuts in dendrograms. This yields a validation scheme for hierarchical clustering approaches commonly used in infrared microscopy. We demonstrate that interactive similarity maps may identify more accurate segmentations than hierarchical clustering based approaches, and thus are a viable and due to their interactive nature attractive alternative to hierarchical clustering. Our validation scheme furthermore shows that performance of hierarchical two-means is comparable to the traditionally used Ward's clustering. As the former is much more efficient in time and memory, our results suggest another less resource demanding alternative for annotating large spectral images.

  18. Algorithms of maximum likelihood data clustering with applications

    Science.gov (United States)

    Giada, Lorenzo; Marsili, Matteo

    2002-12-01

    We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.

  19. The identification of credit card encoders by hierarchical cluster analysis of the jitters of magnetic stripes.

    Science.gov (United States)

    Leung, S C; Fung, W K; Wong, K H

    1999-01-01

    The relative bit density variation graphs of 207 specimen credit cards processed by 12 encoding machines were examined first visually, and then classified by means of hierarchical cluster analysis. Twenty-nine credit cards being treated as 'questioned' samples were tested by way of cluster analysis against 'controls' derived from known encoders. It was found that hierarchical cluster analysis provided a high accuracy of identification with all 29 'questioned' samples classified correctly. On the other hand, although visual comparison of jitter graphs was less discriminating, it was nevertheless capable of giving a reasonably accurate result.

  20. Hierarchical Control for Multiple DC-Microgrids Clusters

    DEFF Research Database (Denmark)

    Shafiee, Qobad; Dragicevic, Tomislav; Vasquez, Juan Carlos

    2014-01-01

    DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed in this p......DC microgrids (MGs) have gained research interest during the recent years because of many potential advantages as compared to the ac system. To ensure reliable operation of a low-voltage dc MG as well as its intelligent operation with the other DC MGs, a hierarchical control is proposed...

  1. URL Mining Using Agglomerative Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Chinmay R. Deshmukh

    2015-02-01

    Full Text Available Abstract The tremendous growth of the web world incorporates application of data mining techniques to the web logs. Data Mining and World Wide Web encompasses an important and active area of research. Web log mining is analysis of web log files with web pages sequences. Web mining is broadly classified as web content mining web usage mining and web structure mining. Web usage mining is a technique to discover usage patterns from Web data in order to understand and better serve the needs of Web-based applications. URL mining refers to a subclass of Web mining that helps us to investigate the details of a Uniform Resource Locator. URL mining can be advantageous in the fields of security and protection. The paper introduces a technique for mining a collection of user transactions with an Internet search engine to discover clusters of similar queries and similar URLs. The information we exploit is a clickthrough data each record consist of a users query to a search engine along with the URL which the user selected from among the candidates offered by search engine. By viewing this dataset as a bipartite graph with the vertices on one side corresponding to queries and on the other side to URLs one can apply an agglomerative clustering algorithm to the graphs vertices to identify related queries and URLs.

  2. Energy Efficient Hierarchical Clustering Approaches in Wireless Sensor Networks: A Survey

    Directory of Open Access Journals (Sweden)

    Bilal Jan

    2017-01-01

    Full Text Available Wireless sensor networks (WSN are one of the significant technologies due to their diverse applications such as health care monitoring, smart phones, military, disaster management, and other surveillance systems. Sensor nodes are usually deployed in large number that work independently in unattended harsh environments. Due to constraint resources, typically the scarce battery power, these wireless nodes are grouped into clusters for energy efficient communication. In clustering hierarchical schemes have achieved great interest for minimizing energy consumption. Hierarchical schemes are generally categorized as cluster-based and grid-based approaches. In cluster-based approaches, nodes are grouped into clusters, where a resourceful sensor node is nominated as a cluster head (CH while in grid-based approach the network is divided into confined virtual grids usually performed by the base station. This paper highlights and discusses the design challenges for cluster-based schemes, the important cluster formation parameters, and classification of hierarchical clustering protocols. Moreover, existing cluster-based and grid-based techniques are evaluated by considering certain parameters to help users in selecting appropriate technique. Furthermore, a detailed summary of these protocols is presented with their advantages, disadvantages, and applicability in particular cases.

  3. A Dynamic Fuzzy Cluster Algorithm for Time Series

    Directory of Open Access Journals (Sweden)

    Min Ji

    2013-01-01

    clustering time series by introducing the definition of key point and improving FCM algorithm. The proposed algorithm works by determining those time series whose class labels are vague and further partitions them into different clusters over time. The main advantage of this approach compared with other existing algorithms is that the property of some time series belonging to different clusters over time can be partially revealed. Results from simulation-based experiments on geographical data demonstrate the excellent performance and the desired results have been obtained. The proposed algorithm can be applied to solve other clustering problems in data mining.

  4. An improved Pearson's correlation proximity-based hierarchical clustering for mining biological association between genes.

    Science.gov (United States)

    Booma, P M; Prabhakaran, S; Dhanalakshmi, R

    2014-01-01

    Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.

  5. Performance Evaluation of Spectral Clustering Algorithm using Various Clustering Validity Indices

    OpenAIRE

    M. T. Somashekara; D. Manjunatha

    2014-01-01

    In spite of the popularity of spectral clustering algorithm, the evaluation procedures are still in developmental stage. In this article, we have taken benchmarking IRIS dataset for performing comparative study of twelve indices for evaluating spectral clustering algorithm. The results of the spectral clustering technique were also compared with k-mean algorithm. The validity of the indices was also verified with accuracy and (Normalized Mutual Information) NMI score. Spectral clustering algo...

  6. Hierarchical cluster analysis of progression patterns in open-angle glaucoma patients with medical treatment.

    Science.gov (United States)

    Bae, Hyoung Won; Rho, Seungsoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2014-04-29

    To classify medically treated open-angle glaucoma (OAG) by the pattern of progression using hierarchical cluster analysis, and to determine OAG progression characteristics by comparing clusters. Ninety-five eyes of 95 OAG patients who received medical treatment, and who had undergone visual field (VF) testing at least once per year for 5 or more years. OAG was classified into subgroups using hierarchical cluster analysis based on the following five variables: baseline mean deviation (MD), baseline visual field index (VFI), MD slope, VFI slope, and Glaucoma Progression Analysis (GPA) printout. After that, other parameters were compared between clusters. Two clusters were made after a hierarchical cluster analysis. Cluster 1 showed -4.06 ± 2.43 dB baseline MD, 92.58% ± 6.27% baseline VFI, -0.28 ± 0.38 dB per year MD slope, -0.52% ± 0.81% per year VFI slope, and all "no progression" cases in GPA printout, whereas cluster 2 showed -8.68 ± 3.81 baseline MD, 77.54 ± 12.98 baseline VFI, -0.72 ± 0.55 MD slope, -2.22 ± 1.89 VFI slope, and seven "possible" and four "likely" progression cases in GPA printout. There were no significant differences in age, sex, mean IOP, central corneal thickness, and axial length between clusters. However, cluster 2 included more high-tension glaucoma patients and used a greater number of antiglaucoma eye drops significantly compared with cluster 1. Hierarchical cluster analysis of progression patterns divided OAG into slow and fast progression groups, evidenced by assessing the parameters of glaucomatous progression in VF testing. In the fast progression group, the prevalence of high-tension glaucoma was greater and the number of antiglaucoma medications administered was increased versus the slow progression group. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.

  7. Wavelet compression of multichannel ECG data by enhanced set partitioning in hierarchical trees algorithm.

    Science.gov (United States)

    Sharifahmadian, Ershad

    2006-01-01

    The set partitioning in hierarchical trees (SPIHT) algorithm is very effective and computationally simple technique for image and signal compression. Here the author modified the algorithm which provides even better performance than the SPIHT algorithm. The enhanced set partitioning in hierarchical trees (ESPIHT) algorithm has performance faster than the SPIHT algorithm. In addition, the proposed algorithm reduces the number of bits in a bit stream which is stored or transmitted. I applied it to compression of multichannel ECG data. Also, I presented a specific procedure based on the modified algorithm for more efficient compression of multichannel ECG data. This method employed on selected records from the MIT-BIH arrhythmia database. According to experiments, the proposed method attained the significant results regarding compression of multichannel ECG data. Furthermore, in order to compress one signal which is stored for a long time, the proposed multichannel compression method can be utilized efficiently.

  8. Local Community Detection Algorithm Based on Minimal Cluster

    Directory of Open Access Journals (Sweden)

    Yong Zhou

    2016-01-01

    Full Text Available In order to discover the structure of local community more effectively, this paper puts forward a new local community detection algorithm based on minimal cluster. Most of the local community detection algorithms begin from one node. The agglomeration ability of a single node must be less than multiple nodes, so the beginning of the community extension of the algorithm in this paper is no longer from the initial node only but from a node cluster containing this initial node and nodes in the cluster are relatively densely connected with each other. The algorithm mainly includes two phases. First it detects the minimal cluster and then finds the local community extended from the minimal cluster. Experimental results show that the quality of the local community detected by our algorithm is much better than other algorithms no matter in real networks or in simulated networks.

  9. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2006-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  10. Cluster fusion algorithm: application to Lennard-Jones clusters

    DEFF Research Database (Denmark)

    Solov'yov, Ilia; Solov'yov, Andrey V.; Greiner, Walter

    2008-01-01

    paths up to the cluster size of 150 atoms. We demonstrate that in this way all known global minima structures of the Lennard-Jones clusters can be found. Our method provides an efficient tool for the calculation and analysis of atomic cluster structure. With its use we justify the magic number sequence......We present a new general theoretical framework for modelling the cluster structure and apply it to description of the Lennard-Jones clusters. Starting from the initial tetrahedral cluster configuration, adding new atoms to the system and absorbing its energy at each step, we find cluster growing...... for the clusters of noble gas atoms and compare it with experimental observations. We report the striking correspondence of the peaks in the dependence of the second derivative of the binding energy per atom on cluster size calculated for the chain of the Lennard-Jones clusters based on the icosahedral symmetry...

  11. A Clustering Approach Using Cooperative Artificial Bee Colony Algorithm

    Directory of Open Access Journals (Sweden)

    Wenping Zou

    2010-01-01

    Full Text Available Artificial Bee Colony (ABC is one of the most recently introduced algorithms based on the intelligent foraging behavior of a honey bee swarm. This paper presents an extended ABC algorithm, namely, the Cooperative Article Bee Colony (CABC, which significantly improves the original ABC in solving complex optimization problems. Clustering is a popular data analysis and data mining technique; therefore, the CABC could be used for solving clustering problems. In this work, first the CABC algorithm is used for optimizing six widely used benchmark functions and the comparative results produced by ABC, Particle Swarm Optimization (PSO, and its cooperative version (CPSO are studied. Second, the CABC algorithm is used for data clustering on several benchmark data sets. The performance of CABC algorithm is compared with PSO, CPSO, and ABC algorithms on clustering problems. The simulation results show that the proposed CABC outperforms the other three algorithms in terms of accuracy, robustness, and convergence speed.

  12. Forecasting building energy consumption with hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system

    Energy Technology Data Exchange (ETDEWEB)

    Li, Kangji [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China); School of Electricity Information Engineering, Jiangsu University, Zhenjiang 212013 (China); Su, Hongye [Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027 (China)

    2010-11-15

    There are several ways to forecast building energy consumption, varying from simple regression to models based on physical principles. In this paper, a new method, namely, the hybrid genetic algorithm-hierarchical adaptive network-based fuzzy inference system (GA-HANFIS) model is developed. In this model, hierarchical structure decreases the rule base dimension. Both clustering and rule base parameters are optimized by GAs and neural networks (NNs). The model is applied to predict a hotel's daily air conditioning consumption for a period over 3 months. The results obtained by the proposed model are presented and compared with regular method of NNs, which indicates that GA-HANFIS model possesses better performance than NNs in terms of their forecasting accuracy. (author)

  13. Hierarchical Bayesian nonparametric mixture models for clustering with variable relevance determination.

    Science.gov (United States)

    Yau, Christopher; Holmes, Chris

    2011-07-01

    We propose a hierarchical Bayesian nonparametric mixture model for clustering when some of the covariates are assumed to be of varying relevance to the clustering problem. This can be thought of as an issue in variable selection for unsupervised learning. We demonstrate that by defining a hierarchical population based nonparametric prior on the cluster locations scaled by the inverse covariance matrices of the likelihood we arrive at a 'sparsity prior' representation which admits a conditionally conjugate prior. This allows us to perform full Gibbs sampling to obtain posterior distributions over parameters of interest including an explicit measure of each covariate's relevance and a distribution over the number of potential clusters present in the data. This also allows for individual cluster specific variable selection. We demonstrate improved inference on a number of canonical problems.

  14. A Flocking Based algorithm for Document Clustering Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Gao, Jinzhu [ORNL; Potok, Thomas E [ORNL

    2006-01-01

    Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the K- means and the Ant clustering algorithm for real document clustering.

  15. Multichannel biomedical time series clustering via hierarchical probabilistic latent semantic analysis.

    Science.gov (United States)

    Wang, Jin; Sun, Xiangping; Nahavandi, Saeid; Kouzani, Abbas; Wu, Yuchuan; She, Mary

    2014-11-01

    Biomedical time series clustering that automatically groups a collection of time series according to their internal similarity is of importance for medical record management and inspection such as bio-signals archiving and retrieval. In this paper, a novel framework that automatically groups a set of unlabelled multichannel biomedical time series according to their internal structural similarity is proposed. Specifically, we treat a multichannel biomedical time series as a document and extract local segments from the time series as words. We extend a topic model, i.e., the Hierarchical probabilistic Latent Semantic Analysis (H-pLSA), which was originally developed for visual motion analysis to cluster a set of unlabelled multichannel time series. The H-pLSA models each channel of the multichannel time series using a local pLSA in the first layer. The topics learned in the local pLSA are then fed to a global pLSA in the second layer to discover the categories of multichannel time series. Experiments on a dataset extracted from multichannel Electrocardiography (ECG) signals demonstrate that the proposed method performs better than previous state-of-the-art approaches and is relatively robust to the variations of parameters including length of local segments and dictionary size. Although the experimental evaluation used the multichannel ECG signals in a biometric scenario, the proposed algorithm is a universal framework for multichannel biomedical time series clustering according to their structural similarity, which has many applications in biomedical time series management. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  16. Mining the National Career Assessment Examination Result Using Clustering Algorithm

    Science.gov (United States)

    Pagudpud, M. V.; Palaoag, T. T.; Padirayon, L. M.

    2018-03-01

    Education is an essential process today which elicits authorities to discover and establish innovative strategies for educational improvement. This study applied data mining using clustering technique for knowledge extraction from the National Career Assessment Examination (NCAE) result in the Division of Quirino. The NCAE is an examination given to all grade 9 students in the Philippines to assess their aptitudes in the different domains. Clustering the students is helpful in identifying students’ learning considerations. With the use of the RapidMiner tool, clustering algorithms such as Density-Based Spatial Clustering of Applications with Noise (DBSCAN), k-means, k-medoid, expectation maximization clustering, and support vector clustering algorithms were analyzed. The silhouette indexes of the said clustering algorithms were compared, and the result showed that the k-means algorithm with k = 3 and silhouette index equal to 0.196 is the most appropriate clustering algorithm to group the students. Three groups were formed having 477 students in the determined group (cluster 0), 310 proficient students (cluster 1) and 396 developing students (cluster 2). The data mining technique used in this study is essential in extracting useful information from the NCAE result to better understand the abilities of students which in turn is a good basis for adopting teaching strategies.

  17. Android Malware Classification Using K-Means Clustering Algorithm

    Science.gov (United States)

    Hamid, Isredza Rahmi A.; Syafiqah Khalid, Nur; Azma Abdullah, Nurul; Rahman, Nurul Hidayah Ab; Chai Wen, Chuah

    2017-08-01

    Malware was designed to gain access or damage a computer system without user notice. Besides, attacker exploits malware to commit crime or fraud. This paper proposed Android malware classification approach based on K-Means clustering algorithm. We evaluate the proposed model in terms of accuracy using machine learning algorithms. Two datasets were selected to demonstrate the practicing of K-Means clustering algorithms that are Virus Total and Malgenome dataset. We classify the Android malware into three clusters which are ransomware, scareware and goodware. Nine features were considered for each types of dataset such as Lock Detected, Text Detected, Text Score, Encryption Detected, Threat, Porn, Law, Copyright and Moneypak. We used IBM SPSS Statistic software for data classification and WEKA tools to evaluate the built cluster. The proposed K-Means clustering algorithm shows promising result with high accuracy when tested using Random Forest algorithm.

  18. An event driven algorithm for fractal cluster formation

    NARCIS (Netherlands)

    González, S.; Thornton, Anthony Richard; Luding, Stefan

    2010-01-01

    A new cluster based event-driven algorithm is developed to simulate the formation of clusters in a two dimensional gas: particles move freely until they collide and "stick" together irreversibly. These clusters aggregate into bigger structures in an isotompic way, forming fractal structures whose

  19. An event driven algorithm for fractal cluster formation

    NARCIS (Netherlands)

    González, S.; Gonzalez Briones, Sebastián; Thornton, Anthony Richard; Luding, Stefan

    2011-01-01

    A new cluster based event-driven algorithm is developed to simulate the formation of clusters in a two dimensional gas: particles move freely until they collide and "stick" together irreversibly. These clusters aggregate into bigger structures in an isotompic way, forming fractal structures whose

  20. Clustering algorithms for Stokes space modulation format recognition

    DEFF Research Database (Denmark)

    Boada, Ricard; Borkowski, Robert; Tafur Monroy, Idelfonso

    2015-01-01

    influences the performance of the detection process, particularly at low signal-to-noise ratios. This paper reports on an extensive study of six different clustering algorithms: k-means, expectation maximization, density-based DBSCAN and OPTICS, spectral clustering and maximum likelihood clustering, used...

  1. Performance Evaluation of Incremental K-means Clustering Algorithm

    OpenAIRE

    Chakraborty, Sanjay; Nagwani, N. K.

    2014-01-01

    The incremental K-means clustering algorithm has already been proposed and analysed in paper [Chakraborty and Nagwani, 2011]. It is a very innovative approach which is applicable in periodically incremental environment and dealing with a bulk of updates. In this paper the performance evaluation is done for this incremental K-means clustering algorithm using air pollution database. This paper also describes the comparison on the performance evaluations between existing K-means clustering and i...

  2. K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical N-body simulations

    International Nuclear Information System (INIS)

    Marzouk, Youssef M.; Ghoniem, Ahmed F.

    2005-01-01

    A number of complex physical problems can be approached through N-body simulation, from fluid flow at high Reynolds number to gravitational astrophysics and molecular dynamics. In all these applications, direct summation is prohibitively expensive for large N and thus hierarchical methods are employed for fast summation. This work introduces new algorithms, based on k-means clustering, for partitioning parallel hierarchical N-body interactions. We demonstrate that the number of particle-cluster interactions and the order at which they are performed are directly affected by partition geometry. Weighted k-means partitions minimize the sum of clusters' second moments and create well-localized domains, and thus reduce the computational cost of N-body approximations by enabling the use of lower-order approximations and fewer cells. We also introduce compatible techniques for dynamic load balancing, including adaptive scaling of cluster volumes and adaptive redistribution of cluster centroids. We demonstrate the performance of these algorithms by constructing a parallel treecode for vortex particle simulations, based on the serial variable-order Cartesian code developed by Lindsay and Krasny [Journal of Computational Physics 172 (2) (2001) 879-907]. The method is applied to vortex simulations of a transverse jet. Results show outstanding parallel efficiencies even at high concurrencies, with velocity evaluation errors maintained at or below their serial values; on a realistic distribution of 1.2 million vortex particles, we observe a parallel efficiency of 98% on 1024 processors. Excellent load balance is achieved even in the face of several obstacles, such as an irregular, time-evolving particle distribution containing a range of length scales and the continual introduction of new vortex particles throughout the domain. Moreover, results suggest that k-means yields a more efficient partition of the domain than a global oct-tree

  3. Final Report of Optimization Algorithms for Hierarchical Problems, with Applications to Nanoporous Materials

    Energy Technology Data Exchange (ETDEWEB)

    Nash, Stephen G.

    2013-11-11

    The research focuses on the modeling and optimization of nanoporous materials. In systems with hierarchical structure that we consider, the physics changes as the scale of the problem is reduced and it can be important to account for physics at the fine level to obtain accurate approximations at coarser levels. For example, nanoporous materials hold promise for energy production and storage. A significant issue is the fabrication of channels within these materials to allow rapid diffusion through the material. One goal of our research is to apply optimization methods to the design of nanoporous materials. Such problems are large and challenging, with hierarchical structure that we believe can be exploited, and with a large range of important scales, down to atomistic. This requires research on large-scale optimization for systems that exhibit different physics at different scales, and the development of algorithms applicable to designing nanoporous materials for many important applications in energy production, storage, distribution, and use. Our research has two major research thrusts. The first is hierarchical modeling. We plan to develop and study hierarchical optimization models for nanoporous materials. The models have hierarchical structure, and attempt to balance the conflicting aims of model fidelity and computational tractability. In addition, we analyze the general hierarchical model, as well as the specific application models, to determine their properties, particularly those properties that are relevant to the hierarchical optimization algorithms. The second thrust was to develop, analyze, and implement a class of hierarchical optimization algorithms, and apply them to the hierarchical models we have developed. We adapted and extended the optimization-based multigrid algorithms of Lewis and Nash to the optimization models exemplified by the hierarchical optimization model. This class of multigrid algorithms has been shown to be a powerful tool for

  4. Co-clustering models, algorithms and applications

    CERN Document Server

    Govaert, Gérard

    2013-01-01

    Cluster or co-cluster analyses are important tools in a variety of scientific areas. The introduction of this book presents a state of the art of already well-established, as well as more recent methods of co-clustering. The authors mainly deal with the two-mode partitioning under different approaches, but pay particular attention to a probabilistic approach. Chapter 1 concerns clustering in general and the model-based clustering in particular. The authors briefly review the classical clustering methods and focus on the mixture model. They present and discuss the use of different mixture

  5. Hierarchical clustering of HPV genotype patterns in the ASCUS-LSIL triage study

    Science.gov (United States)

    Wentzensen, Nicolas; Wilson, Lauren E.; Wheeler, Cosette M.; Carreon, Joseph D.; Gravitt, Patti E.; Schiffman, Mark; Castle, Philip E.

    2010-01-01

    Anogenital cancers are associated with about 13 carcinogenic HPV types in a broader group that cause cervical intraepithelial neoplasia (CIN). Multiple concurrent cervical HPV infections are common which complicate the attribution of HPV types to different grades of CIN. Here we report the analysis of HPV genotype patterns in the ASCUS-LSIL triage study using unsupervised hierarchical clustering. Women who underwent colposcopy at baseline (n = 2780) were grouped into 20 disease categories based on histology and cytology. Disease groups and HPV genotypes were clustered using complete linkage. Risk of 2-year cumulative CIN3+, viral load, colposcopic impression, and age were compared between disease groups and major clusters. Hierarchical clustering yielded four major disease clusters: Cluster 1 included all CIN3 histology with abnormal cytology; Cluster 2 included CIN3 histology with normal cytology and combinations with either CIN2 or high-grade squamous intraepithelial lesion (HSIL) cytology; Cluster 3 included older women with normal or low grade histology/cytology and low viral load; Cluster 4 included younger women with low grade histology/cytology, multiple infections, and the highest viral load. Three major groups of HPV genotypes were identified: Group 1 included only HPV16; Group 2 included nine carcinogenic types plus non-carcinogenic HPV53 and HPV66; and Group 3 included non-carcinogenic types plus carcinogenic HPV33 and HPV45. Clustering results suggested that colposcopy missed a prevalent precancer in many women with no biopsy/normal histology and HSIL. This result was confirmed by an elevated 2-year risk of CIN3+ in these groups. Our novel approach to study multiple genotype infections in cervical disease using unsupervised hierarchical clustering can address complex genotype distributions on a population level. PMID:20959485

  6. Hopfield-K-Means clustering algorithm: A proposal for the segmentation of electricity customers

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, Jose J.; Aguado, Jose A.; Martin, F.; Munoz, F.; Rodriguez, A.; Ruiz, Jose E. [Department of Electrical Engineering, University of Malaga, C/ Dr. Ortiz Ramos, sn., Escuela de Ingenierias, 29071 Malaga (Spain)

    2011-02-15

    Customer classification aims at providing electric utilities with a volume of information to enable them to establish different types of tariffs. Several methods have been used to segment electricity customers, including, among others, the hierarchical clustering, Modified Follow the Leader and K-Means methods. These, however, entail problems with the pre-allocation of the number of clusters (Follow the Leader), randomness of the solution (K-Means) and improvement of the solution obtained (hierarchical algorithm). Another segmentation method used is Hopfield's autonomous recurrent neural network, although the solution obtained only guarantees that it is a local minimum. In this paper, we present the Hopfield-K-Means algorithm in order to overcome these limitations. This approach eliminates the randomness of the initial solution provided by K-Means based algorithms and it moves closer to the global optimun. The proposed algorithm is also compared against other customer segmentation and characterization techniques, on the basis of relative validation indexes. Finally, the results obtained by this algorithm with a set of 230 electricity customers (residential, industrial and administrative) are presented. (author)

  7. Hopfield-K-Means clustering algorithm: A proposal for the segmentation of electricity customers

    International Nuclear Information System (INIS)

    Lopez, Jose J.; Aguado, Jose A.; Martin, F.; Munoz, F.; Rodriguez, A.; Ruiz, Jose E.

    2011-01-01

    Customer classification aims at providing electric utilities with a volume of information to enable them to establish different types of tariffs. Several methods have been used to segment electricity customers, including, among others, the hierarchical clustering, Modified Follow the Leader and K-Means methods. These, however, entail problems with the pre-allocation of the number of clusters (Follow the Leader), randomness of the solution (K-Means) and improvement of the solution obtained (hierarchical algorithm). Another segmentation method used is Hopfield's autonomous recurrent neural network, although the solution obtained only guarantees that it is a local minimum. In this paper, we present the Hopfield-K-Means algorithm in order to overcome these limitations. This approach eliminates the randomness of the initial solution provided by K-Means based algorithms and it moves closer to the global optimun. The proposed algorithm is also compared against other customer segmentation and characterization techniques, on the basis of relative validation indexes. Finally, the results obtained by this algorithm with a set of 230 electricity customers (residential, industrial and administrative) are presented. (author)

  8. Kendall’s tau and agglomerative clustering for structure determination of hierarchical Archimedean copulas

    Directory of Open Access Journals (Sweden)

    Górecki J.

    2017-01-01

    Full Text Available Several successful approaches to structure determination of hierarchical Archimedean copulas (HACs proposed in the literature rely on agglomerative clustering and Kendall’s correlation coefficient. However, there has not been presented any theoretical proof justifying such approaches. This work fills this gap and introduces a theorem showing that, given the matrix of the pairwise Kendall correlation coefficients corresponding to a HAC, its structure can be recovered by an agglomerative clustering technique.

  9. A novel clustering algorithm based on quantum games

    International Nuclear Information System (INIS)

    Li Qiang; He Yan; Jiang Jingping

    2009-01-01

    Enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and then develop a quantum-game-based clustering algorithm, in which data points in a dataset are considered as players who can make decisions and implement quantum strategies in quantum games. After each round of a quantum game, each player's expected payoff is calculated. Later, he uses a link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them in order to maximize his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the simulation results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.

  10. Assessment of genetic divergence in tomato through agglomerative hierarchical clustering and principal component analysis

    International Nuclear Information System (INIS)

    Iqbal, Q.; Saleem, M.Y.; Hameed, A.; Asghar, M.

    2014-01-01

    For the improvement of qualitative and quantitative traits, existence of variability has prime importance in plant breeding. Data on different morphological and reproductive traits of 47 tomato genotypes were analyzed for correlation,agglomerative hierarchical clustering and principal component analysis (PCA) to select genotypes and traits for future breeding program. Correlation analysis revealed significant positive association between yield and yield components like fruit diameter, single fruit weight and number of fruits plant-1. Principal component (PC) analysis depicted first three PCs with Eigen-value higher than 1 contributing 81.72% of total variability for different traits. The PC-I showed positive factor loadings for all the traits except number of fruits plant-1. The contribution of single fruit weight and fruit diameter was highest in PC-1. Cluster analysis grouped all genotypes into five divergent clusters. The genotypes in cluster-II and cluster-V exhibited uniform maturity and higher yield. The D2 statistics confirmed highest distance between cluster- III and cluster-V while maximum similarity was observed in cluster-II and cluster-III. It is therefore suggested that crosses between genotypes of cluster-II and cluster-V with those of cluster-I and cluster-III may exhibit heterosis in F1 for hybrid breeding and for selection of superior genotypes in succeeding generations for cross breeding programme. (author)

  11. Hierarchical clustering analysis of blood plasma lipidomics profiles from mono- and dizygotic twin families

    NARCIS (Netherlands)

    Draisma, H.H.; Reijmers, T.H.; Meulman, J.J.; Greef, J. van der; Hankemeier, T.; Boomsma, D.I.

    2013-01-01

    Twin and family studies are typically used to elucidate the relative contribution of genetic and environmental variation to phenotypic variation. Here, we apply a quantitative genetic method based on hierarchical clustering, to blood plasma lipidomics data obtained in a healthy cohort consisting of

  12. Prediction of in vitro and in vivo oestrogen receptor activity using hierarchical clustering

    Science.gov (United States)

    In this study, hierarchical clustering classification models were developed to predict in vitro and in vivo oestrogen receptor (ER) activity. Classification models were developed for binding, agonist, and antagonist in vitro ER activity and for mouse in vivo uterotrophic ER bindi...

  13. Unit commitment solution using agglomerative and divisive cluster algorithm : an effective new methodology

    Energy Technology Data Exchange (ETDEWEB)

    Reddy, N.M.; Reddy, K.R. [G. Narayanamma Inst. of Technology and Science, Hyderabad (India). Dept. of Electrical Engineering; Ramana, N.V. [JNTU College of Engineering, Jagityala (India). Dept. of Electrical Engineering

    2008-07-01

    Thermal power plants consist of several generating units with different generating capacities, fuel cost per MWH generated, minimum up/down times, and start-up or shut-down costs. The Unit Commitment (UC) problem in power systems involves determining the start-up and shut-down schedules of thermal generating units to meet forecasted load over a future short term for a period of one to seven days. This paper presented a new approach for the most complex UC problem using agglomerative and divisive hierarchical clustering. Euclidean costs, which are a measure of differences in fuel cost and start-up costs of any two units, were first calculated. Then, depending on the value of Euclidean costs, similar type of units were placed in a cluster. The proposed methodology has 2 individual algorithms. An agglomerative cluster algorithm is used while the load is increasing, and a divisive cluster algorithm is used when the load is decreasing. A search was conducted for an optimal solution for a minimal number of clusters and cluster data points. A standard ten-unit thermal unit power system was used to test and evaluate the performance of the method for a period of 24 hours. The new approach proved to be quite effective and satisfactory. 15 refs., 9 tabs., 5 figs.

  14. A hierarchical approach to reducing communication in parallel graph algorithms

    KAUST Repository

    Harshvardhan,; Amato, Nancy M.; Rauchwerger, Lawrence

    2015-01-01

    . This is exacerbated in scale-free networks, such as social and web graphs, which contain hub vertices that have large degrees and therefore send a large number of messages over the network. Furthermore, many graph algorithms and computations send the same data to each

  15. Modification of MSDR algorithm and ITS implementation on graph clustering

    Science.gov (United States)

    Prastiwi, D.; Sugeng, K. A.; Siswantining, T.

    2017-07-01

    Maximum Standard Deviation Reduction (MSDR) is a graph clustering algorithm to minimize the distance variation within a cluster. In this paper we propose a modified MSDR by replacing one technical step in MSDR which uses polynomial regression, with a new and simpler step. This leads to our new algorithm called Modified MSDR (MMSDR). We implement the new algorithm to separate a domestic flight network of an Indonesian airline into two large clusters. Further analysis allows us to discover a weak link in the network, which should be improved by adding more flights.

  16. Hybrid Swarm Intelligence Energy Efficient Clustered Routing Algorithm for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Rajeev Kumar

    2016-01-01

    Full Text Available Currently, wireless sensor networks (WSNs are used in many applications, namely, environment monitoring, disaster management, industrial automation, and medical electronics. Sensor nodes carry many limitations like low battery life, small memory space, and limited computing capability. To create a wireless sensor network more energy efficient, swarm intelligence technique has been applied to resolve many optimization issues in WSNs. In many existing clustering techniques an artificial bee colony (ABC algorithm is utilized to collect information from the field periodically. Nevertheless, in the event based applications, an ant colony optimization (ACO is a good solution to enhance the network lifespan. In this paper, we combine both algorithms (i.e., ABC and ACO and propose a new hybrid ABCACO algorithm to solve a Nondeterministic Polynomial (NP hard and finite problem of WSNs. ABCACO algorithm is divided into three main parts: (i selection of optimal number of subregions and further subregion parts, (ii cluster head selection using ABC algorithm, and (iii efficient data transmission using ACO algorithm. We use a hierarchical clustering technique for data transmission; the data is transmitted from member nodes to the subcluster heads and then from subcluster heads to the elected cluster heads based on some threshold value. Cluster heads use an ACO algorithm to discover the best route for data transmission to the base station (BS. The proposed approach is very useful in designing the framework for forest fire detection and monitoring. The simulation results show that the ABCACO algorithm enhances the stability period by 60% and also improves the goodput by 31% against LEACH and WSNCABC, respectively.

  17. APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags

    DEFF Research Database (Denmark)

    Zong, Yu; Xu, Guandong; Jin, Pin

    2011-01-01

    algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... as the initial tag clustering result and then assign the rest tags into the corresponding clusters based on the similarity. Experimental results on three real world datasets namely MedWorm, MovieLens and Dmoz demonstrate the effectiveness and the superiority of the proposed method against the traditional...... Agglomerative Clustering on tagging data, which possess the inherent drawbacks, such as the sensitivity of initialization. In this paper, we instead make use of the approximate backbone of tag clustering results to find out better tag clusters. In particular, we propose an APProximate backbonE-based Clustering...

  18. Improved multi-objective clustering algorithm using particle swarm optimization.

    Directory of Open Access Journals (Sweden)

    Congcong Gong

    Full Text Available Multi-objective clustering has received widespread attention recently, as it can obtain more accurate and reasonable solution. In this paper, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO is proposed. Firstly, a novel particle representation for clustering problem is designed to help PSO search clustering solutions in continuous space. Secondly, the distribution of Pareto set is analyzed. The analysis results are applied to the leader selection strategy, and make algorithm avoid trapping in local optimum. Moreover, a clustering solution-improved method is proposed, which can increase the efficiency in searching clustering solution greatly. In the experiments, 28 datasets are used and nine state-of-the-art clustering algorithms are compared, the proposed method is superior to other approaches in the evaluation index ARI.

  19. Improved multi-objective clustering algorithm using particle swarm optimization.

    Science.gov (United States)

    Gong, Congcong; Chen, Haisong; He, Weixiong; Zhang, Zhanliang

    2017-01-01

    Multi-objective clustering has received widespread attention recently, as it can obtain more accurate and reasonable solution. In this paper, an improved multi-objective clustering framework using particle swarm optimization (IMCPSO) is proposed. Firstly, a novel particle representation for clustering problem is designed to help PSO search clustering solutions in continuous space. Secondly, the distribution of Pareto set is analyzed. The analysis results are applied to the leader selection strategy, and make algorithm avoid trapping in local optimum. Moreover, a clustering solution-improved method is proposed, which can increase the efficiency in searching clustering solution greatly. In the experiments, 28 datasets are used and nine state-of-the-art clustering algorithms are compared, the proposed method is superior to other approaches in the evaluation index ARI.

  20. A High-Order CFS Algorithm for Clustering Big Data

    Directory of Open Access Journals (Sweden)

    Fanyu Bu

    2016-01-01

    Full Text Available With the development of Internet of Everything such as Internet of Things, Internet of People, and Industrial Internet, big data is being generated. Clustering is a widely used technique for big data analytics and mining. However, most of current algorithms are not effective to cluster heterogeneous data which is prevalent in big data. In this paper, we propose a high-order CFS algorithm (HOCFS to cluster heterogeneous data by combining the CFS clustering algorithm and the dropout deep learning model, whose functionality rests on three pillars: (i an adaptive dropout deep learning model to learn features from each type of data, (ii a feature tensor model to capture the correlations of heterogeneous data, and (iii a tensor distance-based high-order CFS algorithm to cluster heterogeneous data. Furthermore, we verify our proposed algorithm on different datasets, by comparison with other two clustering schemes, that is, HOPCM and CFS. Results confirm the effectiveness of the proposed algorithm in clustering heterogeneous data.

  1. A Hierarchical Volumetric Shadow Algorithm for Single Scattering

    OpenAIRE

    Baran, Ilya; Chen, Jiawen; Ragan-Kelley, Jonathan Millar; Durand, Fredo; Lehtinen, Jaakko

    2010-01-01

    Volumetric effects such as beams of light through participating media are an important component in the appearance of the natural world. Many such effects can be faithfully modeled by a single scattering medium. In the presence of shadows, rendering these effects can be prohibitively expensive: current algorithms are based on ray marching, i.e., integrating the illumination scattered towards the camera along each view ray, modulated by visibility to the light source at each sample. Visibility...

  2. Searching for continuous gravitational wave signals. The hierarchical Hough transform algorithm

    International Nuclear Information System (INIS)

    Papa, M.; Schutz, B.F.; Sintes, A.M.

    2001-01-01

    It is well known that matched filtering techniques cannot be applied for searching extensive parameter space volumes for continuous gravitational wave signals. This is the reason why alternative strategies are being pursued. Hierarchical strategies are best at investigating a large parameter space when there exist computational power constraints. Algorithms of this kind are being implemented by all the groups that are developing software for analyzing the data of the gravitational wave detectors that will come online in the next years. In this talk I will report about the hierarchical Hough transform method that the GEO 600 data analysis team at the Albert Einstein Institute is developing. The three step hierarchical algorithm has been described elsewhere [8]. In this talk I will focus on some of the implementational aspects we are currently concerned with. (author)

  3. Optimal Parameter for the Training of Multilayer Perceptron Neural Networks by Using Hierarchical Genetic Algorithm

    International Nuclear Information System (INIS)

    Orozco-Monteagudo, Maykel; Taboada-Crispi, Alberto; Gutierrez-Hernandez, Liliana

    2008-01-01

    This paper deals with the controversial topic of the selection of the parameters of a genetic algorithm, in this case hierarchical, used for training of multilayer perceptron neural networks for the binary classification. The parameters to select are the crossover and mutation probabilities of the control and parametric genes and the permanency percent. The results can be considered as a guide for using this kind of algorithm.

  4. Parallel clustering algorithm for large-scale biological data sets.

    Science.gov (United States)

    Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

    2014-01-01

    Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.

  5. Bottom-up GGM algorithm for constructing multiple layered hierarchical gene regulatory networks

    Science.gov (United States)

    Multilayered hierarchical gene regulatory networks (ML-hGRNs) are very important for understanding genetics regulation of biological pathways. However, there are currently no computational algorithms available for directly building ML-hGRNs that regulate biological pathways. A bottom-up graphic Gaus...

  6. A Hierarchical Algorithm for Integrated Scheduling and Control With Applications to Power Systems

    DEFF Research Database (Denmark)

    Sokoler, Leo Emil; Dinesen, Peter Juhler; Jørgensen, John Bagterp

    2016-01-01

    The contribution of this paper is a hierarchical algorithm for integrated scheduling and control via model predictive control of hybrid systems. The controlled system is a linear system composed of continuous control, state, and output variables. Binary variables occur as scheduling decisions in ...

  7. A new hybrid imperialist competitive algorithm on data clustering

    Indian Academy of Sciences (India)

    Modified imperialist competitive algorithm; simulated annealing; ... Clustering is one of the unsupervised learning branches where a set of patterns, usually vectors ..... machine classification is based on design, operation, and/or purpose.

  8. An AK-LDMeans algorithm based on image clustering

    Science.gov (United States)

    Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan

    2018-03-01

    Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.

  9. Flowbca : A flow-based cluster algorithm in Stata

    NARCIS (Netherlands)

    Meekes, J.; Hassink, W.H.J.

    In this article, we introduce the Stata implementation of a flow-based cluster algorithm written in Mata. The main purpose of the flowbca command is to identify clusters based on relational data of flows. We illustrate the command by providing multiple applications, from the research fields of

  10. Dynamics of the baryonic component in hierarchical clustering universes

    Science.gov (United States)

    Navarro, Julio

    1993-01-01

    I present self-consistent 3-D simulations of the formation of virialized systems containing both gas and dark matter in a flat universe. A fully Lagrangian code based on the Smoothed Particle Hydrodynamics technique and a tree data structure has been used to evolve regions of comoving radius 2-3 Mpc. Tidal effects are included by coarse-sampling the density of the outer regions up to a radius approx. 20 Mpc. Initial conditions are set at high redshift (z greater than 7) using a standard Cold Dark Matter perturbation spectrum and a baryon mass fraction of 10 percent (omega(sub b) = 0.1). Simulations in which the gas evolves either adiabatically or radiates energy at a rate determined locally by its cooling function were performed. This allows us to investigate with the same set of simulations the importance of radiative losses in the formation of galaxies and the equilibrium structure of virialized systems where cooling is very inefficient. In the absence of radiative losses, the simulations can be rescaled to the density and radius typical of galaxy clusters. A summary of the main results is presented.

  11. A new clustering algorithm for scanning electron microscope images

    Science.gov (United States)

    Yousef, Amr; Duraisamy, Prakash; Karim, Mohammad

    2016-04-01

    A scanning electron microscope (SEM) is a type of electron microscope that produces images of a sample by scanning it with a focused beam of electrons. The electrons interact with the sample atoms, producing various signals that are collected by detectors. The gathered signals contain information about the sample's surface topography and composition. The electron beam is generally scanned in a raster scan pattern, and the beam's position is combined with the detected signal to produce an image. The most common configuration for an SEM produces a single value per pixel, with the results usually rendered as grayscale images. The captured images may be produced with insufficient brightness, anomalous contrast, jagged edges, and poor quality due to low signal-to-noise ratio, grained topography and poor surface details. The segmentation of the SEM images is a tackling problems in the presence of the previously mentioned distortions. In this paper, we are stressing on the clustering of these type of images. In that sense, we evaluate the performance of the well-known unsupervised clustering and classification techniques such as connectivity based clustering (hierarchical clustering), centroid-based clustering, distribution-based clustering and density-based clustering. Furthermore, we propose a new spatial fuzzy clustering technique that works efficiently on this type of images and compare its results against these regular techniques in terms of clustering validation metrics.

  12. Collaborative filtering recommendation model based on fuzzy clustering algorithm

    Science.gov (United States)

    Yang, Ye; Zhang, Yunhua

    2018-05-01

    As one of the most widely used algorithms in recommender systems, collaborative filtering algorithm faces two serious problems, which are the sparsity of data and poor recommendation effect in big data environment. In traditional clustering analysis, the object is strictly divided into several classes and the boundary of this division is very clear. However, for most objects in real life, there is no strict definition of their forms and attributes of their class. Concerning the problems above, this paper proposes to improve the traditional collaborative filtering model through the hybrid optimization of implicit semantic algorithm and fuzzy clustering algorithm, meanwhile, cooperating with collaborative filtering algorithm. In this paper, the fuzzy clustering algorithm is introduced to fuzzy clustering the information of project attribute, which makes the project belong to different project categories with different membership degrees, and increases the density of data, effectively reduces the sparsity of data, and solves the problem of low accuracy which is resulted from the inaccuracy of similarity calculation. Finally, this paper carries out empirical analysis on the MovieLens dataset, and compares it with the traditional user-based collaborative filtering algorithm. The proposed algorithm has greatly improved the recommendation accuracy.

  13. Image Registration Algorithm Based on Parallax Constraint and Clustering Analysis

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-01-01

    To resolve the problem of slow computation speed and low matching accuracy in image registration, a new image registration algorithm based on parallax constraint and clustering analysis is proposed. Firstly, Harris corner detection algorithm is used to extract the feature points of two images. Secondly, use Normalized Cross Correlation (NCC) function to perform the approximate matching of feature points, and the initial feature pair is obtained. Then, according to the parallax constraint condition, the initial feature pair is preprocessed by K-means clustering algorithm, which is used to remove the feature point pairs with obvious errors in the approximate matching process. Finally, adopt Random Sample Consensus (RANSAC) algorithm to optimize the feature points to obtain the final feature point matching result, and the fast and accurate image registration is realized. The experimental results show that the image registration algorithm proposed in this paper can improve the accuracy of the image matching while ensuring the real-time performance of the algorithm.

  14. Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

    KAUST Repository

    Halim Boukaram, Wajih

    2017-09-14

    We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

  15. Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression

    KAUST Repository

    Halim Boukaram, Wajih; Turkiyyah, George; Ltaief, Hatem; Keyes, David E.

    2017-01-01

    We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

  16. Research on retailer data clustering algorithm based on Spark

    Science.gov (United States)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  17. Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.

    Science.gov (United States)

    He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej

    2011-12-01

    Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.

  18. On the Disruption of Star Clusters in a Hierarchical Interstellar Medium

    Science.gov (United States)

    Elmegreen, Bruce G.; Hunter, Deidre A.

    2010-03-01

    The distribution of the number of clusters as a function of mass M and age T suggests that clusters get eroded or dispersed in a regular way over time, such that the cluster number decreases inversely as an approximate power law with T within each fixed interval of M. This power law is inconsistent with standard dispersal mechanisms such as cluster evaporation and cloud collisions. In the conventional interpretation, it requires the unlikely situation where diverse mechanisms stitch together over time in a way that is independent of environment or M. Here, we consider another model in which the large-scale distribution of gas in each star-forming region plays an important role. We note that star clusters form with positional and temporal correlations in giant cloud complexes, and suggest that these complexes dominate the tidal force and collisional influence on a cluster during its first several hundred million years. Because the cloud complex density decreases regularly with position from the cluster birth site, the harassment and collision rates between the cluster and the cloud pieces decrease regularly with age as the cluster drifts. This decrease is typically a power law of the form required to explain the mass-age distribution. We reproduce this distribution for a variety of cases, including rapid disruption, slow erosion, combinations of these two, cluster-cloud collisions, cluster disruption by hierarchical disassembly, and partial cluster disruption. We also consider apparent cluster mass loss by fading below the surface brightness limit of a survey. In all cases, the observed log M-log T diagram can be reproduced under reasonable assumptions.

  19. ON THE DISRUPTION OF STAR CLUSTERS IN A HIERARCHICAL INTERSTELLAR MEDIUM

    International Nuclear Information System (INIS)

    Elmegreen, Bruce G.; Hunter, Deidre A.

    2010-01-01

    The distribution of the number of clusters as a function of mass M and age T suggests that clusters get eroded or dispersed in a regular way over time, such that the cluster number decreases inversely as an approximate power law with T within each fixed interval of M. This power law is inconsistent with standard dispersal mechanisms such as cluster evaporation and cloud collisions. In the conventional interpretation, it requires the unlikely situation where diverse mechanisms stitch together over time in a way that is independent of environment or M. Here, we consider another model in which the large-scale distribution of gas in each star-forming region plays an important role. We note that star clusters form with positional and temporal correlations in giant cloud complexes, and suggest that these complexes dominate the tidal force and collisional influence on a cluster during its first several hundred million years. Because the cloud complex density decreases regularly with position from the cluster birth site, the harassment and collision rates between the cluster and the cloud pieces decrease regularly with age as the cluster drifts. This decrease is typically a power law of the form required to explain the mass-age distribution. We reproduce this distribution for a variety of cases, including rapid disruption, slow erosion, combinations of these two, cluster-cloud collisions, cluster disruption by hierarchical disassembly, and partial cluster disruption. We also consider apparent cluster mass loss by fading below the surface brightness limit of a survey. In all cases, the observed log M-log T diagram can be reproduced under reasonable assumptions.

  20. Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to “Shakespeare Authorship Question”

    Directory of Open Access Journals (Sweden)

    Refat Aljumily

    2015-09-01

    Full Text Available A few literary scholars have long claimed that Shakespeare did not write some of his best plays (history plays and tragedies and proposed at one time or another various suspect authorship candidates. Most modern-day scholars of Shakespeare have rejected this claim, arguing that strong evidence that Shakespeare wrote the plays and poems being his name appears on them as the author. This has caused and led to an ongoing scholarly academic debate for quite some long time. Stylometry is a fast-growing field often used to attribute authorship to anonymous or disputed texts. Stylometric attempts to resolve this literary puzzle have raised interesting questions over the past few years. The following paper contributes to “the Shakespeare authorship question” by using a mathematically-based methodology to examine the hypothesis that Shakespeare wrote all the disputed plays traditionally attributed to him. More specifically, the mathematically based methodology used here is based on Mean Proximity, as a linear hierarchical clustering method, and on Principal Components Analysis, as a non-hierarchical linear clustering method. It is also based, for the first time in the domain, on Self-Organizing Map U-Matrix and Voronoi Map, as non-linear clustering methods to cover the possibility that our data contains significant non-linearities. Vector Space Model (VSM is used to convert texts into vectors in a high dimensional space. The aim of which is to compare the degrees of similarity within and between limited samples of text (the disputed plays. The various works and plays assumed to have been written by Shakespeare and possible authors notably, Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd, where “similarity” is defined in terms of correlation/distance coefficient measure based on the frequency of usage profiles of function words, word bi-grams, and character triple-grams. The claim that Shakespeare authored all the disputed

  1. An Energy Efficient Cooperative Hierarchical MIMO Clustering Scheme for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Sungyoung Lee

    2011-12-01

    Full Text Available In this work, we present an energy efficient hierarchical cooperative clustering scheme for wireless sensor networks. Communication cost is a crucial factor in depleting the energy of sensor nodes. In the proposed scheme, nodes cooperate to form clusters at each level of network hierarchy ensuring maximal coverage and minimal energy expenditure with relatively uniform distribution of load within the network. Performance is enhanced by cooperative multiple-input multiple-output (MIMO communication ensuring energy efficiency for WSN deployments over large geographical areas. We test our scheme using TOSSIM and compare the proposed scheme with cooperative multiple-input multiple-output (CMIMO clustering scheme and traditional multihop Single-Input-Single-Output (SISO routing approach. Performance is evaluated on the basis of number of clusters, number of hops, energy consumption and network lifetime. Experimental results show significant energy conservation and increase in network lifetime as compared to existing schemes.

  2. Robustness of the ATLAS pixel clustering neural network algorithm

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00407780; The ATLAS collaboration

    2016-01-01

    Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. In the ATLAS track reconstruction algorithm, an artificial neural network is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The robustness of the neural network algorithm is presented, probing its sensitivity to uncertainties in the detector conditions. The robustness is studied by evaluating the stability of the algorithm's performance under a range of variations in the inputs to the neural networks. Within reasonable variation magnitudes, the neural networks prove to be robust to most variation types.

  3. Cluster algorithms with empahsis on quantum spin systems

    International Nuclear Information System (INIS)

    Gubernatis, J.E.; Kawashima, Naoki

    1995-01-01

    The purpose of this lecture is to discuss in detail the generalized approach of Kawashima and Gubernatis for the construction of cluster algorithms. We first present a brief refresher on the Monte Carlo method, describe the Swendsen-Wang algorithm, show how this algorithm follows from the Fortuin-Kastelyn transformation, and re=interpret this transformation in a form which is the basis of the generalized approach. We then derive the essential equations of the generalized approach. This derivation is remarkably simple if done from the viewpoint of probability theory, and the essential assumptions will be clearly stated. These assumptions are implicit in all useful cluster algorithms of which we are aware. They lead to a quite different perspective on cluster algorithms than found in the seminal works and in Ising model applications. Next, we illustrate how the generalized approach leads to a cluster algorithm for world-line quantum Monte Carlo simulations of Heisenberg models with S = 1/2. More succinctly, we also discuss the generalization of the Fortuin- Kasetelyn transformation to higher spin models and illustrate the essential steps for a S = 1 Heisenberg model. Finally, we summarize how to go beyond S = 1 to a general spin, XYZ model

  4. Personalized PageRank Clustering: A graph clustering algorithm based on random walks

    Science.gov (United States)

    A. Tabrizi, Shayan; Shakery, Azadeh; Asadpour, Masoud; Abbasi, Maziar; Tavallaie, Mohammad Ali

    2013-11-01

    Graph clustering has been an essential part in many methods and thus its accuracy has a significant effect on many applications. In addition, exponential growth of real-world graphs such as social networks, biological networks and electrical circuits demands clustering algorithms with nearly-linear time and space complexity. In this paper we propose Personalized PageRank Clustering (PPC) that employs the inherent cluster exploratory property of random walks to reveal the clusters of a given graph. We combine random walks and modularity to precisely and efficiently reveal the clusters of a graph. PPC is a top-down algorithm so it can reveal inherent clusters of a graph more accurately than other nearly-linear approaches that are mainly bottom-up. It also gives a hierarchy of clusters that is useful in many applications. PPC has a linear time and space complexity and has been superior to most of the available clustering algorithms on many datasets. Furthermore, its top-down approach makes it a flexible solution for clustering problems with different requirements.

  5. AN IMPROVED FUZZY CLUSTERING ALGORITHM FOR MICROARRAY IMAGE SPOTS SEGMENTATION

    Directory of Open Access Journals (Sweden)

    V.G. Biju

    2015-11-01

    Full Text Available An automatic cDNA microarray image processing using an improved fuzzy clustering algorithm is presented in this paper. The spot segmentation algorithm proposed uses the gridding technique developed by the authors earlier, for finding the co-ordinates of each spot in an image. Automatic cropping of spots from microarray image is done using these co-ordinates. The present paper proposes an improved fuzzy clustering algorithm Possibility fuzzy local information c means (PFLICM to segment the spot foreground (FG from background (BG. The PFLICM improves fuzzy local information c means (FLICM algorithm by incorporating typicality of a pixel along with gray level information and local spatial information. The performance of the algorithm is validated using a set of simulated cDNA microarray images added with different levels of AWGN noise. The strength of the algorithm is tested by computing the parameters such as the Segmentation matching factor (SMF, Probability of error (pe, Discrepancy distance (D and Normal mean square error (NMSE. SMF value obtained for PFLICM algorithm shows an improvement of 0.9 % and 0.7 % for high noise and low noise microarray images respectively compared to FLICM algorithm. The PFLICM algorithm is also applied on real microarray images and gene expression values are computed.

  6. Spin chain simulations with a meron cluster algorithm

    International Nuclear Information System (INIS)

    Boyer, T.; Bietenholz, W.; Deutsches Elektronen-Synchrotron; Wuilloud, J.; Geneve Univ.

    2007-01-01

    We apply a meron cluster algorithm to the XY spin chain, which describes a quantum rotor. This is a multi-cluster simulation supplemented by an improved estimator, which deals with objects of half-integer topological charge. This method is powerful enough to provide precise results for the model with a θ-term - it is therefore one of the rare examples, where a system with a complex action can be solved numerically. In particular we measure the correlation length, as well as the topological and magnetic susceptibility. We discuss the algorithmic efficiency in view of the critical slowing down. Due to the excellent performance that we observe, it is strongly motivated to work on new applications of meron cluster algorithms in higher dimensions. (orig.)

  7. Hierarchical Threshold Adaptive for Point Cloud Filter Algorithm of Moving Surface Fitting

    Directory of Open Access Journals (Sweden)

    ZHU Xiaoxiao

    2018-02-01

    Full Text Available In order to improve the accuracy,efficiency and adaptability of point cloud filtering algorithm,a hierarchical threshold adaptive for point cloud filter algorithm of moving surface fitting was proposed.Firstly,the noisy points are removed by using a statistic histogram method.Secondly,the grid index is established by grid segmentation,and the surface equation is set up through the lowest point among the neighborhood grids.The real height and fit are calculated.The difference between the elevation and the threshold can be determined.Finally,in order to improve the filtering accuracy,hierarchical filtering is used to change the grid size and automatically set the neighborhood size and threshold until the filtering result reaches the accuracy requirement.The test data provided by the International Photogrammetry and Remote Sensing Society (ISPRS is used to verify the algorithm.The first and second error and the total error are 7.33%,10.64% and 6.34% respectively.The algorithm is compared with the eight classical filtering algorithms published by ISPRS.The experiment results show that the method has well-adapted and it has high accurate filtering result.

  8. Evolutionary-Hierarchical Bases of the Formation of Cluster Model of Innovation Economic Development

    Directory of Open Access Journals (Sweden)

    Yuliya Vladimirovna Dubrovskaya

    2016-10-01

    Full Text Available The functioning of a modern economic system is based on the interaction of objects of different hierarchical levels. Thus, the problem of the study of innovation processes taking into account the mutual influence of the activities of these economic actors becomes important. The paper dwells evolutionary basis for the formation of models of innovation development on the basis of micro and macroeconomic analysis. Most of the concepts recognized that despite a big number of diverse models, the coordination of the relations between economic agents is of crucial importance for the successful innovation development. According to the results of the evolutionary-hierarchical analysis, the authors reveal key phases of the development of forms of business cooperation, science and government in the domestic economy. It has become the starting point of the conception of the characteristics of the interaction in the cluster models of innovation development of the economy. Considerable expectancies on improvement of the national innovative system are connected with the development of cluster and network structures. The main objective of government authorities is the formation of mechanisms and institutions that will foster cooperation between members of the clusters. The article explains that the clusters cannot become the factors in the growth of the national economy, not being an effective tool for interaction between the actors of the regional innovative systems.

  9. Hierarchical Star Formation in Turbulent Media: Evidence from Young Star Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Grasha, K.; Calzetti, D. [Astronomy Department, University of Massachusetts, Amherst, MA 01003 (United States); Elmegreen, B. G. [IBM Research Division, T.J. Watson Research Center, Yorktown Heights, NY (United States); Adamo, A.; Messa, M. [Department of Astronomy, The Oskar Klein Centre, Stockholm University, Stockholm (Sweden); Aloisi, A.; Bright, S. N.; Lee, J. C.; Ryon, J. E.; Ubeda, L. [Space Telescope Science Institute, Baltimore, MD (United States); Cook, D. O. [California Institute of Technology, 1200 East California Boulevard, Pasadena, CA (United States); Dale, D. A. [Department of Physics and Astronomy, University of Wyoming, Laramie, WY (United States); Fumagalli, M. [Institute for Computational Cosmology and Centre for Extragalactic Astronomy, Department of Physics, Durham University, Durham (United Kingdom); Gallagher III, J. S. [Department of Astronomy, University of Wisconsin–Madison, Madison, WI (United States); Gouliermis, D. A. [Zentrum für Astronomie der Universität Heidelberg, Institut für Theoretische Astrophysik, Albert-Ueberle-Str. 2, D-69120 Heidelberg (Germany); Grebel, E. K. [Astronomisches Rechen-Institut, Zentrum für Astronomie der Universität Heidelberg, Mönchhofstr. 12-14, D-69120, Heidelberg (Germany); Kahre, L. [Department of Astronomy, New Mexico State University, Las Cruces, NM (United States); Kim, H. [Gemini Observatory, La Serena (Chile); Krumholz, M. R., E-mail: kgrasha@astro.umass.edu [Research School of Astronomy and Astrophysics, Australian National University, Canberra, ACT 2611 (Australia)

    2017-06-10

    We present an analysis of the positions and ages of young star clusters in eight local galaxies to investigate the connection between the age difference and separation of cluster pairs. We find that star clusters do not form uniformly but instead are distributed so that the age difference increases with the cluster pair separation to the 0.25–0.6 power, and that the maximum size over which star formation is physically correlated ranges from ∼200 pc to ∼1 kpc. The observed trends between age difference and separation suggest that cluster formation is hierarchical both in space and time: clusters that are close to each other are more similar in age than clusters born further apart. The temporal correlations between stellar aggregates have slopes that are consistent with predictions of turbulence acting as the primary driver of star formation. The velocity associated with the maximum size is proportional to the galaxy’s shear, suggesting that the galactic environment influences the maximum size of the star-forming structures.

  10. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  11. Identifying multiple influential spreaders by a heuristic clustering algorithm

    Energy Technology Data Exchange (ETDEWEB)

    Bao, Zhong-Kui [School of Mathematical Science, Anhui University, Hefei 230601 (China); Liu, Jian-Guo [Data Science and Cloud Service Research Center, Shanghai University of Finance and Economics, Shanghai, 200133 (China); Zhang, Hai-Feng, E-mail: haifengzhang1978@gmail.com [School of Mathematical Science, Anhui University, Hefei 230601 (China); Department of Communication Engineering, North University of China, Taiyuan, Shan' xi 030051 (China)

    2017-03-18

    The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.

  12. Identifying multiple influential spreaders by a heuristic clustering algorithm

    International Nuclear Information System (INIS)

    Bao, Zhong-Kui; Liu, Jian-Guo; Zhang, Hai-Feng

    2017-01-01

    The problem of influence maximization in social networks has attracted much attention. However, traditional centrality indices are suitable for the case where a single spreader is chosen as the spreading source. Many times, spreading process is initiated by simultaneously choosing multiple nodes as the spreading sources. In this situation, choosing the top ranked nodes as multiple spreaders is not an optimal strategy, since the chosen nodes are not sufficiently scattered in networks. Therefore, one ideal situation for multiple spreaders case is that the spreaders themselves are not only influential but also they are dispersively distributed in networks, but it is difficult to meet the two conditions together. In this paper, we propose a heuristic clustering (HC) algorithm based on the similarity index to classify nodes into different clusters, and finally the center nodes in clusters are chosen as the multiple spreaders. HC algorithm not only ensures that the multiple spreaders are dispersively distributed in networks but also avoids the selected nodes to be very “negligible”. Compared with the traditional methods, our experimental results on synthetic and real networks indicate that the performance of HC method on influence maximization is more significant. - Highlights: • A heuristic clustering algorithm is proposed to identify the multiple influential spreaders in complex networks. • The algorithm can not only guarantee the selected spreaders are sufficiently scattered but also avoid to be “insignificant”. • The performance of our algorithm is generally better than other methods, regardless of real networks or synthetic networks.

  13. A hierarchical cluster analysis of normal-tension glaucoma using spectral-domain optical coherence tomography parameters.

    Science.gov (United States)

    Bae, Hyoung Won; Ji, Yongwoo; Lee, Hye Sun; Lee, Naeun; Hong, Samin; Seong, Gong Je; Sung, Kyung Rim; Kim, Chan Yun

    2015-01-01

    Normal-tension glaucoma (NTG) is a heterogenous disease, and there is still controversy about subclassifications of this disorder. On the basis of spectral-domain optical coherence tomography (SD-OCT), we subdivided NTG with hierarchical cluster analysis using optic nerve head (ONH) parameters and retinal nerve fiber layer (RNFL) thicknesses. A total of 200 eyes of 200 NTG patients between March 2011 and June 2012 underwent SD-OCT scans to measure ONH parameters and RNFL thicknesses. We classified NTG into homogenous subgroups based on these variables using a hierarchical cluster analysis, and compared clusters to evaluate diverse NTG characteristics. Three clusters were found after hierarchical cluster analysis. Cluster 1 (62 eyes) had the thickest RNFL and widest rim area, and showed early glaucoma features. Cluster 2 (60 eyes) was characterized by the largest cup/disc ratio and cup volume, and showed advanced glaucomatous damage. Cluster 3 (78 eyes) had small disc areas in SD-OCT and were comprised of patients with significantly younger age, longer axial length, and greater myopia than the other 2 groups. A hierarchical cluster analysis of SD-OCT scans divided NTG patients into 3 groups based upon ONH parameters and RNFL thicknesses. It is anticipated that the small disc area group comprised of younger and more myopic patients may show unique features unlike the other 2 groups.

  14. Water quality assessment with hierarchical cluster analysis based on Mahalanobis distance.

    Science.gov (United States)

    Du, Xiangjun; Shao, Fengjing; Wu, Shunyao; Zhang, Hanlin; Xu, Si

    2017-07-01

    Water quality assessment is crucial for assessment of marine eutrophication, prediction of harmful algal blooms, and environment protection. Previous studies have developed many numeric modeling methods and data driven approaches for water quality assessment. The cluster analysis, an approach widely used for grouping data, has also been employed. However, there are complex correlations between water quality variables, which play important roles in water quality assessment but have always been overlooked. In this paper, we analyze correlations between water quality variables and propose an alternative method for water quality assessment with hierarchical cluster analysis based on Mahalanobis distance. Further, we cluster water quality data collected form coastal water of Bohai Sea and North Yellow Sea of China, and apply clustering results to evaluate its water quality. To evaluate the validity, we also cluster the water quality data with cluster analysis based on Euclidean distance, which are widely adopted by previous studies. The results show that our method is more suitable for water quality assessment with many correlated water quality variables. To our knowledge, it is the first attempt to apply Mahalanobis distance for coastal water quality assessment.

  15. A similarity based agglomerative clustering algorithm in networks

    Science.gov (United States)

    Liu, Zhiyuan; Wang, Xiujuan; Ma, Yinghong

    2018-04-01

    The detection of clusters is benefit for understanding the organizations and functions of networks. Clusters, or communities, are usually groups of nodes densely interconnected but sparsely linked with any other clusters. To identify communities, an efficient and effective community agglomerative algorithm based on node similarity is proposed. The proposed method initially calculates similarities between each pair of nodes, and form pre-partitions according to the principle that each node is in the same community as its most similar neighbor. After that, check each partition whether it satisfies community criterion. For the pre-partitions who do not satisfy, incorporate them with others that having the biggest attraction until there are no changes. To measure the attraction ability of a partition, we propose an attraction index that based on the linked node's importance in networks. Therefore, our proposed method can better exploit the nodes' properties and network's structure. To test the performance of our algorithm, both synthetic and empirical networks ranging in different scales are tested. Simulation results show that the proposed algorithm can obtain superior clustering results compared with six other widely used community detection algorithms.

  16. Fuzzy cluster means algorithm for the diagnosis of confusable disease

    African Journals Online (AJOL)

    ... end platform while Microsoft Access was used as the database application. The system gives a measure of each disease within a set of confusable disease. The proposed system had a classification accuracy of 60%. Keywords: Artificial Intelligence, expert system Fuzzy cluster – means Algorithm, physician, Diagnosis ...

  17. Modified genetic algorithms to model cluster structures in medium-size silicon clusters

    International Nuclear Information System (INIS)

    Bazterra, Victor E.; Ona, Ofelia; Caputo, Maria C.; Ferraro, Marta B.; Fuentealba, Patricio; Facelli, Julio C.

    2004-01-01

    This paper presents the results obtained using a genetic algorithm (GA) to search for stable structures of medium size silicon clusters. In this work the GA uses a semiempirical energy function to find the best cluster structures, which are further optimized using density-functional theory. For small clusters our results agree well with previously reported structures, but for larger ones different structures appear. This is the case of Si 36 where we report a different structure, with significant lower energy than those previously found using limited search approaches on common structural motifs. This demonstrates the need for global optimization schemes when searching for stable structures of medium-size silicon clusters

  18. The C4 clustering algorithm: Clusters of galaxies in the Sloan Digital Sky Survey

    Energy Technology Data Exchange (ETDEWEB)

    Miller, Christopher J.; Nichol, Robert; Reichart, Dan; Wechsler, Risa H.; Evrard, August; Annis, James; McKay, Timothy; Bahcall, Neta; Bernardi, Mariangela; Boehringer,; Connolly, Andrew; Goto, Tomo; Kniazev, Alexie; Lamb, Donald; Postman, Marc; Schneider, Donald; Sheth, Ravi; Voges, Wolfgang; /Cerro-Tololo InterAmerican Obs. /Portsmouth U.,

    2005-03-01

    We present the ''C4 Cluster Catalog'', a new sample of 748 clusters of galaxies identified in the spectroscopic sample of the Second Data Release (DR2) of the Sloan Digital Sky Survey (SDSS). The C4 cluster-finding algorithm identifies clusters as overdensities in a seven-dimensional position and color space, thus minimizing projection effects that have plagued previous optical cluster selection. The present C4 catalog covers {approx}2600 square degrees of sky and ranges in redshift from z = 0.02 to z = 0.17. The mean cluster membership is 36 galaxies (with redshifts) brighter than r = 17.7, but the catalog includes a range of systems, from groups containing 10 members to massive clusters with over 200 cluster members with redshifts. The catalog provides a large number of measured cluster properties including sky location, mean redshift, galaxy membership, summed r-band optical luminosity (L{sub r}), velocity dispersion, as well as quantitative measures of substructure and the surrounding large-scale environment. We use new, multi-color mock SDSS galaxy catalogs, empirically constructed from the {Lambda}CDM Hubble Volume (HV) Sky Survey output, to investigate the sensitivity of the C4 catalog to the various algorithm parameters (detection threshold, choice of passbands and search aperture), as well as to quantify the purity and completeness of the C4 cluster catalog. These mock catalogs indicate that the C4 catalog is {approx_equal}90% complete and 95% pure above M{sub 200} = 1 x 10{sup 14} h{sup -1}M{sub {circle_dot}} and within 0.03 {le} z {le} 0.12. Using the SDSS DR2 data, we show that the C4 algorithm finds 98% of X-ray identified clusters and 90% of Abell clusters within 0.03 {le} z {le} 0.12. Using the mock galaxy catalogs and the full HV dark matter simulations, we show that the L{sub r} of a cluster is a more robust estimator of the halo mass (M{sub 200}) than the galaxy line-of-sight velocity dispersion or the richness of the cluster

  19. Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale.

    Science.gov (United States)

    Emmons, Scott; Kobourov, Stephen; Gallant, Mike; Börner, Katy

    2016-01-01

    Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.

  20. High-dimensional cluster analysis with the Masked EM Algorithm

    Science.gov (United States)

    Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.

    2014-01-01

    Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694

  1. A HYBRID HEURISTIC ALGORITHM FOR THE CLUSTERED TRAVELING SALESMAN PROBLEM

    Directory of Open Access Journals (Sweden)

    Mário Mestria

    2016-04-01

    Full Text Available ABSTRACT This paper proposes a hybrid heuristic algorithm, based on the metaheuristics Greedy Randomized Adaptive Search Procedure, Iterated Local Search and Variable Neighborhood Descent, to solve the Clustered Traveling Salesman Problem (CTSP. Hybrid Heuristic algorithm uses several variable neighborhood structures combining the intensification (using local search operators and diversification (constructive heuristic and perturbation routine. In the CTSP, the vertices are partitioned into clusters and all vertices of each cluster have to be visited contiguously. The CTSP is -hard since it includes the well-known Traveling Salesman Problem (TSP as a special case. Our hybrid heuristic is compared with three heuristics from the literature and an exact method. Computational experiments are reported for different classes of instances. Experimental results show that the proposed hybrid heuristic obtains competitive results within reasonable computational time.

  2. Class hierarchical test case generation algorithm based on expanded EMDPN model

    Institute of Scientific and Technical Information of China (English)

    LI Jun-yi; GONG Hong-fang; HU Ji-ping; ZOU Bei-ji; SUN Jia-guang

    2006-01-01

    A new model of event and message driven Petri network(EMDPN) based on the characteristic of class interaction for messages passing between two objects was extended. Using EMDPN interaction graph, a class hierarchical test-case generation algorithm with cooperated paths (copaths) was proposed, which can be used to solve the problems resulting from the class inheritance mechanism encountered in object-oriented software testing such as oracle, message transfer errors, and unreachable statement. Finally, the testing sufficiency was analyzed with the ordered sequence testing criterion(OSC). The results indicate that the test cases stemmed from newly proposed automatic algorithm of copaths generation satisfies synchronization message sequences testing criteria, therefore the proposed new algorithm of copaths generation has a good coverage rate.

  3. Clustering Using Boosted Constrained k-Means Algorithm

    Directory of Open Access Journals (Sweden)

    Masayuki Okabe

    2018-03-01

    Full Text Available This article proposes a constrained clustering algorithm with competitive performance and less computation time to the state-of-the-art methods, which consists of a constrained k-means algorithm enhanced by the boosting principle. Constrained k-means clustering using constraints as background knowledge, although easy to implement and quick, has insufficient performance compared with metric learning-based methods. Since it simply adds a function into the data assignment process of the k-means algorithm to check for constraint violations, it often exploits only a small number of constraints. Metric learning-based methods, which exploit constraints to create a new metric for data similarity, have shown promising results although the methods proposed so far are often slow depending on the amount of data or number of feature dimensions. We present a method that exploits the advantages of the constrained k-means and metric learning approaches. It incorporates a mechanism for accepting constraint priorities and a metric learning framework based on the boosting principle into a constrained k-means algorithm. In the framework, a metric is learned in the form of a kernel matrix that integrates weak cluster hypotheses produced by the constrained k-means algorithm, which works as a weak learner under the boosting principle. Experimental results for 12 data sets from 3 data sources demonstrated that our method has performance competitive to those of state-of-the-art constrained clustering methods for most data sets and that it takes much less computation time. Experimental evaluation demonstrated the effectiveness of controlling the constraint priorities by using the boosting principle and that our constrained k-means algorithm functions correctly as a weak learner of boosting.

  4. Which clustering algorithm is better for predicting protein complexes?

    Directory of Open Access Journals (Sweden)

    Moschopoulos Charalampos N

    2011-12-01

    Full Text Available Abstract Background Protein-Protein interactions (PPI play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks. Results In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H and Tandem Affinity Purification (TAP methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases. Conclusions While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm

  5. Evaluation of clustering algorithms for protein-protein interaction networks

    Directory of Open Access Journals (Sweden)

    van Helden Jacques

    2006-11-01

    Full Text Available Abstract Background Protein interactions are crucial components of all cellular processes. Recently, high-throughput methods have been developed to obtain a global description of the interactome (the whole network of protein interactions for a given organism. In 2002, the yeast interactome was estimated to contain up to 80,000 potential interactions. This estimate is based on the integration of data sets obtained by various methods (mass spectrometry, two-hybrid methods, genetic studies. High-throughput methods are known, however, to yield a non-negligible rate of false positives, and to miss a fraction of existing interactions. The interactome can be represented as a graph where nodes correspond with proteins and edges with pairwise interactions. In recent years clustering methods have been developed and applied in order to extract relevant modules from such graphs. These algorithms require the specification of parameters that may drastically affect the results. In this paper we present a comparative assessment of four algorithms: Markov Clustering (MCL, Restricted Neighborhood Search Clustering (RNSC, Super Paramagnetic Clustering (SPC, and Molecular Complex Detection (MCODE. Results A test graph was built on the basis of 220 complexes annotated in the MIPS database. To evaluate the robustness to false positives and false negatives, we derived 41 altered graphs by randomly removing edges from or adding edges to the test graph in various proportions. Each clustering algorithm was applied to these graphs with various parameter settings, and the clusters were compared with the annotated complexes. We analyzed the sensitivity of the algorithms to the parameters and determined their optimal parameter values. We also evaluated their robustness to alterations of the test graph. We then applied the four algorithms to six graphs obtained from high-throughput experiments and compared the resulting clusters with the annotated complexes. Conclusion This

  6. Evaluation of hierarchical agglomerative cluster analysis methods for discrimination of primary biological aerosol

    Directory of Open Access Journals (Sweden)

    I. Crawford

    2015-11-01

    Full Text Available In this paper we present improved methods for discriminating and quantifying primary biological aerosol particles (PBAPs by applying hierarchical agglomerative cluster analysis to multi-parameter ultraviolet-light-induced fluorescence (UV-LIF spectrometer data. The methods employed in this study can be applied to data sets in excess of 1 × 106 points on a desktop computer, allowing for each fluorescent particle in a data set to be explicitly clustered. This reduces the potential for misattribution found in subsampling and comparative attribution methods used in previous approaches, improving our capacity to discriminate and quantify PBAP meta-classes. We evaluate the performance of several hierarchical agglomerative cluster analysis linkages and data normalisation methods using laboratory samples of known particle types and an ambient data set. Fluorescent and non-fluorescent polystyrene latex spheres were sampled with a Wideband Integrated Bioaerosol Spectrometer (WIBS-4 where the optical size, asymmetry factor and fluorescent measurements were used as inputs to the analysis package. It was found that the Ward linkage with z-score or range normalisation performed best, correctly attributing 98 and 98.1 % of the data points respectively. The best-performing methods were applied to the BEACHON-RoMBAS (Bio–hydro–atmosphere interactions of Energy, Aerosols, Carbon, H2O, Organics and Nitrogen–Rocky Mountain Biogenic Aerosol Study ambient data set, where it was found that the z-score and range normalisation methods yield similar results, with each method producing clusters representative of fungal spores and bacterial aerosol, consistent with previous results. The z-score result was compared to clusters generated with previous approaches (WIBS AnalysiS Program, WASP where we observe that the subsampling and comparative attribution method employed by WASP results in the overestimation of the fungal spore concentration by a factor of 1.5 and the

  7. A heuristic approach to possibilistic clustering algorithms and applications

    CERN Document Server

    Viattchenin, Dmitri A

    2013-01-01

    The present book outlines a new approach to possibilistic clustering in which the sought clustering structure of the set of objects is based directly on the formal definition of fuzzy cluster and the possibilistic memberships are determined directly from the values of the pairwise similarity of objects.   The proposed approach can be used for solving different classification problems. Here, some techniques that might be useful at this purpose are outlined, including a methodology for constructing a set of labeled objects for a semi-supervised clustering algorithm, a methodology for reducing analyzed attribute space dimensionality and a methods for asymmetric data processing. Moreover,  a technique for constructing a subset of the most appropriate alternatives for a set of weak fuzzy preference relations, which are defined on a universe of alternatives, is described in detail, and a method for rapidly prototyping the Mamdani’s fuzzy inference systems is introduced. This book addresses engineers, scientist...

  8. An adaptive map-matching algorithm based on hierarchical fuzzy system from vehicular GPS data.

    Directory of Open Access Journals (Sweden)

    Jinjun Tang

    Full Text Available An improved hierarchical fuzzy inference method based on C-measure map-matching algorithm is proposed in this paper, in which the C-measure represents the certainty or probability of the vehicle traveling on the actual road. A strategy is firstly introduced to use historical positioning information to employ curve-curve matching between vehicle trajectories and shapes of candidate roads. It improves matching performance by overcoming the disadvantage of traditional map-matching algorithm only considering current information. An average historical distance is used to measure similarity between vehicle trajectories and road shape. The input of system includes three variables: distance between position point and candidate roads, angle between driving heading and road direction, and average distance. As the number of fuzzy rules will increase exponentially when adding average distance as a variable, a hierarchical fuzzy inference system is then applied to reduce fuzzy rules and improve the calculation efficiency. Additionally, a learning process is updated to support the algorithm. Finally, a case study contains four different routes in Beijing city is used to validate the effectiveness and superiority of the proposed method.

  9. Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

    Science.gov (United States)

    Fu, QiMing

    2016-01-01

    To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704

  10. Hierarchical clustering into groups of human brain regions according to elemental composition

    International Nuclear Information System (INIS)

    Stedman, J.D.; Spyrou, N.M.

    1998-01-01

    Thirteen brain regions were dissected from both hemispheres of fifteen 'normal' ageing subjects (8 females, 7 males) of mean age 79±7 years. Elemental compositions were determined by simultaneous application of particle induced X-ray emission (PIXE) and Rutherford backscattering (RBS) analyses using a 2 MeV, 4 nA proton beam scanned over 4 mm 2 of the sample surface. Elemental concentrations were found to be dependent upon the brain region and hemisphere studied. Hierarchical cluster analysis was applied to group the brain regions according to the sample concentrations of eight elements. The resulting dendrogram is presented and its clusters related to the sample compositions of grey and white matter. (author)

  11. ABCluster: the artificial bee colony algorithm for cluster global optimization.

    Science.gov (United States)

    Zhang, Jun; Dolg, Michael

    2015-10-07

    Global optimization of cluster geometries is of fundamental importance in chemistry and an interesting problem in applied mathematics. In this work, we introduce a relatively new swarm intelligence algorithm, i.e. the artificial bee colony (ABC) algorithm proposed in 2005, to this field. It is inspired by the foraging behavior of a bee colony, and only three parameters are needed to control it. We applied it to several potential functions of quite different nature, i.e., the Coulomb-Born-Mayer, Lennard-Jones, Morse, Z and Gupta potentials. The benchmarks reveal that for long-ranged potentials the ABC algorithm is very efficient in locating the global minimum, while for short-ranged ones it is sometimes trapped into a local minimum funnel on a potential energy surface of large clusters. We have released an efficient, user-friendly, and free program "ABCluster" to realize the ABC algorithm. It is a black-box program for non-experts as well as experts and might become a useful tool for chemists to study clusters.

  12. A Hierarchical and Distributed Approach for Mapping Large Applications to Heterogeneous Grids using Genetic Algorithms

    Science.gov (United States)

    Sanyal, Soumya; Jain, Amit; Das, Sajal K.; Biswas, Rupak

    2003-01-01

    In this paper, we propose a distributed approach for mapping a single large application to a heterogeneous grid environment. To minimize the execution time of the parallel application, we distribute the mapping overhead to the available nodes of the grid. This approach not only provides a fast mapping of tasks to resources but is also scalable. We adopt a hierarchical grid model and accomplish the job of mapping tasks to this topology using a scheduler tree. Results show that our three-phase algorithm provides high quality mappings, and is fast and scalable.

  13. Robustness of the ATLAS pixel clustering neural network algorithm

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00407780; The ATLAS collaboration

    2016-01-01

    Proton-proton collisions at the energy frontier puts strong constraints on track reconstruction algorithms. The algorithms depend heavily on accurate estimation of the position of particles as they traverse the inner detector elements. An artificial neural network algorithm is utilised to identify and split clusters of neighbouring read-out elements in the ATLAS pixel detector created by multiple charged particles. The method recovers otherwise lost tracks in dense environments where particles are separated by distances comparable to the size of the detector read-out elements. Such environments are highly relevant for LHC run 2, e.g. in searches for heavy resonances. Within the scope of run 2 track reconstruction performance and upgrades, the robustness of the neural network algorithm will be presented. The robustness has been studied by evaluating the stability of the algorithm’s performance under a range of variations in the pixel detector conditions.

  14. Stochastic cluster algorithms for discrete Gaussian (SOS) models

    International Nuclear Information System (INIS)

    Evertz, H.G.; Hamburg Univ.; Hasenbusch, M.; Marcu, M.; Tel Aviv Univ.; Pinn, K.; Muenster Univ.; Solomon, S.

    1990-10-01

    We present new Monte Carlo cluster algorithms which eliminate critical slowing down in the simulation of solid-on-solid models. In this letter we focus on the two-dimensional discrete Gaussian model. The algorithms are based on reflecting the integer valued spin variables with respect to appropriately chosen reflection planes. The proper choice of the reflection plane turns out to be crucial in order to obtain a small dynamical exponent z. Actually, the successful versions of our algorithm are a mixture of two different procedures for choosing the reflection plane, one of them ergodic but slow, the other one non-ergodic and also slow when combined with a Metropolis algorithm. (orig.)

  15. A hierarchical clustering scheme approach to assessment of IP-network traffic using detrended fluctuation analysis

    Science.gov (United States)

    Takuma, Takehisa; Masugi, Masao

    2009-03-01

    This paper presents an approach to the assessment of IP-network traffic in terms of the time variation of self-similarity. To get a comprehensive view in analyzing the degree of long-range dependence (LRD) of IP-network traffic, we use a hierarchical clustering scheme, which provides a way to classify high-dimensional data with a tree-like structure. Also, in the LRD-based analysis, we employ detrended fluctuation analysis (DFA), which is applicable to the analysis of long-range power-law correlations or LRD in non-stationary time-series signals. Based on sequential measurements of IP-network traffic at two locations, this paper derives corresponding values for the LRD-related parameter α that reflects the degree of LRD of measured data. In performing the hierarchical clustering scheme, we use three parameters: the α value, average throughput, and the proportion of network traffic that exceeds 80% of network bandwidth for each measured data set. We visually confirm that the traffic data can be classified in accordance with the network traffic properties, resulting in that the combined depiction of the LRD and other factors can give us an effective assessment of network conditions at different times.

  16. Exploring New Clustering Algorithms for the CMS Tracker FED

    CERN Document Server

    Gamboa Alvarado, Jose Leandro

    2013-01-01

    In the current Front End (FE) firmware clusters of hits within the APV frames are found using a simple threshold comparison (which is made between the data and a 3 or 5 sigma strip noise cut) on reordered pedestal and Common Mode (CM) noise subtracted data. In addition the CM noise subtraction requires the baseline of each APV frame to be approximately uniform. Therefore, the current algorithm will fail if the APV baseline exhibits large-scale non-uniform behavior. Under very high luminosity conditions the assumption of a uniform APV baseline breaks down and the FED is unable to maintain a high efficiency of cluster finding. \

  17. Permutation Tests of Hierarchical Cluster Analyses of Carrion Communities and Their Potential Use in Forensic Entomology.

    Science.gov (United States)

    van der Ham, Joris L

    2016-05-19

    Forensic entomologists can use carrion communities' ecological succession data to estimate the postmortem interval (PMI). Permutation tests of hierarchical cluster analyses of these data provide a conceptual method to estimate part of the PMI, the post-colonization interval (post-CI). This multivariate approach produces a baseline of statistically distinct clusters that reflect changes in the carrion community composition during the decomposition process. Carrion community samples of unknown post-CIs are compared with these baseline clusters to estimate the post-CI. In this short communication, I use data from previously published studies to demonstrate the conceptual feasibility of this multivariate approach. Analyses of these data produce series of significantly distinct clusters, which represent carrion communities during 1- to 20-day periods of the decomposition process. For 33 carrion community samples, collected over an 11-day period, this approach correctly estimated the post-CI within an average range of 3.1 days. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  18. Hierarchical Control Strategy for Active Hydropneumatic Suspension Vehicles Based on Genetic Algorithms

    Directory of Open Access Journals (Sweden)

    Jinzhi Feng

    2015-02-01

    Full Text Available A new hierarchical control strategy for active hydropneumatic suspension systems is proposed. This strategy considers the dynamic characteristics of the actuator. The top hierarchy controller uses a combined control scheme: a genetic algorithm- (GA- based self-tuning proportional-integral-derivative controller and a fuzzy logic controller. For practical implementations of the proposed control scheme, a GA-based self-learning process is initiated only when the defined performance index of vehicle dynamics exceeds a certain debounce time threshold. The designed control algorithm is implemented on a virtual prototype and cosimulations are performed with different road disturbance inputs. Cosimulation results show that the active hydropneumatic suspension system designed in this study significantly improves riding comfort characteristics of vehicles. The robustness and adaptability of the proposed controller are also examined when the control system is subjected to extremely rough road conditions.

  19. An improved hierarchical A * algorithm in the optimization of parking lots

    Science.gov (United States)

    Wang, Yong; Wu, Junjuan; Wang, Ying

    2017-08-01

    In the parking lot parking path optimization, the traditional evaluation index is the shortest distance as the best index and it does not consider the actual road conditions. Now, the introduction of a more practical evaluation index can not only simplify the hardware design of the boot system but also save the software overhead. Firstly, we establish the parking lot network graph RPCDV mathematical model and all nodes in the network is divided into two layers which were constructed using different evaluation function base on the improved hierarchical A * algorithm which improves the time optimal path search efficiency and search precision of the evaluation index. The final results show that for different sections of the program attribute parameter algorithm always faster the time to find the optimal path.

  20. FCM Clustering Algorithms for Segmentation of Brain MR Images

    Directory of Open Access Journals (Sweden)

    Yogita K. Dubey

    2016-01-01

    Full Text Available The study of brain disorders requires accurate tissue segmentation of magnetic resonance (MR brain images which is very important for detecting tumors, edema, and necrotic tissues. Segmentation of brain images, especially into three main tissue types: Cerebrospinal Fluid (CSF, Gray Matter (GM, and White Matter (WM, has important role in computer aided neurosurgery and diagnosis. Brain images mostly contain noise, intensity inhomogeneity, and weak boundaries. Therefore, accurate segmentation of brain images is still a challenging area of research. This paper presents a review of fuzzy c-means (FCM clustering algorithms for the segmentation of brain MR images. The review covers the detailed analysis of FCM based algorithms with intensity inhomogeneity correction and noise robustness. Different methods for the modification of standard fuzzy objective function with updating of membership and cluster centroid are also discussed.

  1. Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data

    Directory of Open Access Journals (Sweden)

    Reilly John J

    2005-06-01

    Full Text Available Abstract Background Advances in miniature sensor technology have led to the development of wearable systems that allow one to monitor motor activities in the field. A variety of classifiers have been proposed in the past, but little has been done toward developing systematic approaches to assess the feasibility of discriminating the motor tasks of interest and to guide the choice of the classifier architecture. Methods A technique is introduced to address this problem according to a hierarchical framework and its use is demonstrated for the application of detecting motor activities in patients with chronic obstructive pulmonary disease (COPD undergoing pulmonary rehabilitation. Accelerometers were used to collect data for 10 different classes of activity. Features were extracted to capture essential properties of the data set and reduce the dimensionality of the problem at hand. Cluster measures were utilized to find natural groupings in the data set and then construct a hierarchy of the relationships between clusters to guide the process of merging clusters that are too similar to distinguish reliably. It provides a means to assess whether the benefits of merging for performance of a classifier outweigh the loss of resolution incurred through merging. Results Analysis of the COPD data set demonstrated that motor tasks related to ambulation can be reliably discriminated from tasks performed in a seated position with the legs in motion or stationary using two features derived from one accelerometer. Classifying motor tasks within the category of activities related to ambulation requires more advanced techniques. While in certain cases all the tasks could be accurately classified, in others merging clusters associated with different motor tasks was necessary. When merging clusters, it was found that the proposed method could lead to more than 12% improvement in classifier accuracy while retaining resolution of 4 tasks. Conclusion Hierarchical

  2. Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

    Directory of Open Access Journals (Sweden)

    Ning Li

    2016-01-01

    Full Text Available A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

  3. Advanced defect detection algorithm using clustering in ultrasonic NDE

    Science.gov (United States)

    Gongzhang, Rui; Gachagan, Anthony

    2016-02-01

    A range of materials used in industry exhibit scattering properties which limits ultrasonic NDE. Many algorithms have been proposed to enhance defect detection ability, such as the well-known Split Spectrum Processing (SSP) technique. Scattering noise usually cannot be fully removed and the remaining noise can be easily confused with real feature signals, hence becoming artefacts during the image interpretation stage. This paper presents an advanced algorithm to further reduce the influence of artefacts remaining in A-scan data after processing using a conventional defect detection algorithm. The raw A-scan data can be acquired from either traditional single transducer or phased array configurations. The proposed algorithm uses the concept of unsupervised machine learning to cluster segmental defect signals from pre-processed A-scans into different classes. The distinction and similarity between each class and the ensemble of randomly selected noise segments can be observed by applying a classification algorithm. Each class will then be labelled as `legitimate reflector' or `artefacts' based on this observation and the expected probability of defection (PoD) and probability of false alarm (PFA) determined. To facilitate data collection and validate the proposed algorithm, a 5MHz linear array transducer is used to collect A-scans from both austenitic steel and Inconel samples. Each pulse-echo A-scan is pre-processed using SSP and the subsequent application of the proposed clustering algorithm has provided an additional reduction to PFA while maintaining PoD for both samples compared with SSP results alone.

  4. Аdaptive clustering algorithm for recommender systems

    OpenAIRE

    Stekh, Yu.; Artsibasov, V.

    2012-01-01

    In this article adaptive clustering algorithm for recommender systems is developed. Розроблено адаптивний алгоритм кластеризації для рекомендаційних систем.

  5. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

    2006-01-01

    Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...

  6. Clustering analysis

    International Nuclear Information System (INIS)

    Romli

    1997-01-01

    Cluster analysis is the name of group of multivariate techniques whose principal purpose is to distinguish similar entities from the characteristics they process.To study this analysis, there are several algorithms that can be used. Therefore, this topic focuses to discuss the algorithms, such as, similarity measures, and hierarchical clustering which includes single linkage, complete linkage and average linkage method. also, non-hierarchical clustering method, which is popular name K -mean method ' will be discussed. Finally, this paper will be described the advantages and disadvantages of every methods

  7. Extended Traffic Crash Modelling through Precision and Response Time Using Fuzzy Clustering Algorithms Compared with Multi-layer Perceptron

    Directory of Open Access Journals (Sweden)

    Iman Aghayan

    2012-11-01

    Full Text Available This paper compares two fuzzy clustering algorithms – fuzzy subtractive clustering and fuzzy C-means clustering – to a multi-layer perceptron neural network for their ability to predict the severity of crash injuries and to estimate the response time on the traffic crash data. Four clustering algorithms – hierarchical, K-means, subtractive clustering, and fuzzy C-means clustering – were used to obtain the optimum number of clusters based on the mean silhouette coefficient and R-value before applying the fuzzy clustering algorithms. The best-fit algorithms were selected according to two criteria: precision (root mean square, R-value, mean absolute errors, and sum of square error and response time (t. The highest R-value was obtained for the multi-layer perceptron (0.89, demonstrating that the multi-layer perceptron had a high precision in traffic crash prediction among the prediction models, and that it was stable even in the presence of outliers and overlapping data. Meanwhile, in comparison with other prediction models, fuzzy subtractive clustering provided the lowest value for response time (0.284 second, 9.28 times faster than the time of multi-layer perceptron, meaning that it could lead to developing an on-line system for processing data from detectors and/or a real-time traffic database. The model can be extended through improvements based on additional data through induction procedure.

  8. Complex scenes and situations visualization in hierarchical learning algorithm with dynamic 3D NeoAxis engine

    Science.gov (United States)

    Graham, James; Ternovskiy, Igor V.

    2013-06-01

    We applied a two stage unsupervised hierarchical learning system to model complex dynamic surveillance and cyber space monitoring systems using a non-commercial version of the NeoAxis visualization software. The hierarchical scene learning and recognition approach is based on hierarchical expectation maximization, and was linked to a 3D graphics engine for validation of learning and classification results and understanding the human - autonomous system relationship. Scene recognition is performed by taking synthetically generated data and feeding it to a dynamic logic algorithm. The algorithm performs hierarchical recognition of the scene by first examining the features of the objects to determine which objects are present, and then determines the scene based on the objects present. This paper presents a framework within which low level data linked to higher-level visualization can provide support to a human operator and be evaluated in a detailed and systematic way.

  9. A cluster analysis on road traffic accidents using genetic algorithms

    Science.gov (United States)

    Saharan, Sabariah; Baragona, Roberto

    2017-04-01

    The analysis of traffic road accidents is increasingly important because of the accidents cost and public road safety. The availability or large data sets makes the study of factors that affect the frequency and severity accidents are viable. However, the data are often highly unbalanced and overlapped. We deal with the data set of the road traffic accidents recorded in Christchurch, New Zealand, from 2000-2009 with a total of 26440 accidents. The data is in a binary set and there are 50 factors road traffic accidents with four level of severity. We used genetic algorithm for the analysis because we are in the presence of a large unbalanced data set and standard clustering like k-means algorithm may not be suitable for the task. The genetic algorithm based on clustering for unknown K, (GCUK) has been used to identify the factors associated with accidents of different levels of severity. The results provided us with an interesting insight into the relationship between factors and accidents severity level and suggest that the two main factors that contributes to fatal accidents are "Speed greater than 60 km h" and "Did not see other people until it was too late". A comparison with the k-means algorithm and the independent component analysis is performed to validate the results.

  10. Community Clustering Algorithm in Complex Networks Based on Microcommunity Fusion

    Directory of Open Access Journals (Sweden)

    Jin Qi

    2015-01-01

    Full Text Available With the further research on physical meaning and digital features of the community structure in complex networks in recent years, the improvement of effectiveness and efficiency of the community mining algorithms in complex networks has become an important subject in this area. This paper puts forward a concept of the microcommunity and gets final mining results of communities through fusing different microcommunities. This paper starts with the basic definition of the network community and applies Expansion to the microcommunity clustering which provides prerequisites for the microcommunity fusion. The proposed algorithm is more efficient and has higher solution quality compared with other similar algorithms through the analysis of test results based on network data set.

  11. An improved clustering algorithm based on reverse learning in intelligent transportation

    Science.gov (United States)

    Qiu, Guoqing; Kou, Qianqian; Niu, Ting

    2017-05-01

    With the development of artificial intelligence and data mining technology, big data has gradually entered people's field of vision. In the process of dealing with large data, clustering is an important processing method. By introducing the reverse learning method in the clustering process of PAM clustering algorithm, to further improve the limitations of one-time clustering in unsupervised clustering learning, and increase the diversity of clustering clusters, so as to improve the quality of clustering. The algorithm analysis and experimental results show that the algorithm is feasible.

  12. Finding reproducible cluster partitions for the k-means algorithm.

    Science.gov (United States)

    Lisboa, Paulo J G; Etchells, Terence A; Jarman, Ian H; Chambers, Simon J

    2013-01-01

    K-means clustering is widely used for exploratory data analysis. While its dependence on initialisation is well-known, it is common practice to assume that the partition with lowest sum-of-squares (SSQ) total i.e. within cluster variance, is both reproducible under repeated initialisations and also the closest that k-means can provide to true structure, when applied to synthetic data. We show that this is generally the case for small numbers of clusters, but for values of k that are still of theoretical and practical interest, similar values of SSQ can correspond to markedly different cluster partitions. This paper extends stability measures previously presented in the context of finding optimal values of cluster number, into a component of a 2-d map of the local minima found by the k-means algorithm, from which not only can values of k be identified for further analysis but, more importantly, it is made clear whether the best SSQ is a suitable solution or whether obtaining a consistently good partition requires further application of the stability index. The proposed method is illustrated by application to five synthetic datasets replicating a real world breast cancer dataset with varying data density, and a large bioinformatics dataset.

  13. Enhancement of Adaptive Cluster Hierarchical Routing Protocol using Distance and Energy for Wireless Sensor Networks

    International Nuclear Information System (INIS)

    Nawar, N.M.; Soliman, S.E.; Kelash, H.M.; Ayad, N.M.

    2014-01-01

    The application of wireless networking is widely used in nuclear applications. This includes reactor control and fire dedication system. This paper is devoted to the application of this concept in the intrusion system of the Radioisotope Production Facility (RPF) of the Egyptian Atomic Energy Authority. This includes the tracking, monitoring and control components of this system. The design and implementation of wireless sensor networks has become a hot area of research due to the extensive use of sensor networks to enable applications that connect the physical world to the virtual world [1-2]. The original LEACH is named a communication protocol (clustering-based); the extended LEACH’s stochastic cluster head selection algorithm by a deterministic component. Depending on the network configuration an increase of network lifetime can be accomplished [3]. The proposed routing mechanisms after enhancement divide the nodes into clusters. A cluster head performs its task which is considerably more energy-intensive than the rest of the nodes inside sensor network. So, nodes rotate tasks at different rounds between a cluster head and other sensors throughout the lifetime of the network to balance the energy dissipation [4-5].The performance improvement when using routing protocol after enhancement of the algorithm which takes into consideration the distance and the remaining energy for choosing the cluster head by obtains from the advertise message. Network Simulator (Ns2 simulator) is used to prove that LEACH after enhancement performs better than the original LEACH protocol in terms of Average Energy, Network Life Time, Delay, Throughput and Overhead.

  14. Clustering, Hierarchical Organization, and the Topography of Abstract and Concrete Nouns

    Directory of Open Access Journals (Sweden)

    Joshua eTroche

    2014-04-01

    Full Text Available The empirical study of language has historically relied heavily upon concrete word stimuli. By definition, concrete words evoke salient perceptual associations that fit well within feature-based, sensorimotor models of word meaning. In contrast, many theorists argue that abstract words are disembodied in that their meaning is mediated through language. We investigated word meaning as distributed in multidimensional space using hierarchical cluster analysis. Participants (N=365 rated target words (n=400 English nouns across 12 cognitive dimensions (e.g., polarity, ease of teaching, emotional valence. Factor reduction revealed three latent factors, corresponding roughly to perceptual salience, affective association, and magnitude. We plotted the original 400 words for the three latent factors. Abstract and concrete words showed overlap in their topography but also differentiated themselves in semantic space. This topographic approach to word meaning offers a unique perspective to word concreteness.

  15. Clustering, hierarchical organization, and the topography of abstract and concrete nouns.

    Science.gov (United States)

    Troche, Joshua; Crutch, Sebastian; Reilly, Jamie

    2014-01-01

    The empirical study of language has historically relied heavily upon concrete word stimuli. By definition, concrete words evoke salient perceptual associations that fit well within feature-based, sensorimotor models of word meaning. In contrast, many theorists argue that abstract words are "disembodied" in that their meaning is mediated through language. We investigated word meaning as distributed in multidimensional space using hierarchical cluster analysis. Participants (N = 365) rated target words (n = 400 English nouns) across 12 cognitive dimensions (e.g., polarity, ease of teaching, emotional valence). Factor reduction revealed three latent factors, corresponding roughly to perceptual salience, affective association, and magnitude. We plotted the original 400 words for the three latent factors. Abstract and concrete words showed overlap in their topography but also differentiated themselves in semantic space. This topographic approach to word meaning offers a unique perspective to word concreteness.

  16. Hierarchical Controlled Remote State Preparation by Using a Four-Qubit Cluster State

    Science.gov (United States)

    Ma, Peng-Cheng; Chen, Gui-Bin; Li, Xiao-Wei; Zhan, You-Bang

    2018-06-01

    We propose a scheme for hierarchical controlled remote preparation of an arbitrary single-qubit state via a four-qubit cluster state as the quantum channel. In this scheme, a sender wishes to help three agents to remotely prepare a quantum state, respectively. The three agents are divided into two grades, that is, an agent is in the upper grade and other two agents are in the lower grade. In this process of remote state preparation, the agent of the upper grade only needs the assistance of any one of the other two agents for recovering the sender's original state, while an agent of the lower grade needs the collaboration of all the other two agents. In other words, the agents of two grades have different authorities to reconstruct sender's original state.

  17. Hierarchical Controlled Remote State Preparation by Using a Four-Qubit Cluster State

    Science.gov (United States)

    Ma, Peng-Cheng; Chen, Gui-Bin; Li, Xiao-Wei; Zhan, You-Bang

    2018-02-01

    We propose a scheme for hierarchical controlled remote preparation of an arbitrary single-qubit state via a four-qubit cluster state as the quantum channel. In this scheme, a sender wishes to help three agents to remotely prepare a quantum state, respectively. The three agents are divided into two grades, that is, an agent is in the upper grade and other two agents are in the lower grade. In this process of remote state preparation, the agent of the upper grade only needs the assistance of any one of the other two agents for recovering the sender's original state, while an agent of the lower grade needs the collaboration of all the other two agents. In other words, the agents of two grades have different authorities to reconstruct sender's original state.

  18. [Cluster analysis in biomedical researches].

    Science.gov (United States)

    Akopov, A S; Moskovtsev, A A; Dolenko, S A; Savina, G D

    2013-01-01

    Cluster analysis is one of the most popular methods for the analysis of multi-parameter data. The cluster analysis reveals the internal structure of the data, group the separate observations on the degree of their similarity. The review provides a definition of the basic concepts of cluster analysis, and discusses the most popular clustering algorithms: k-means, hierarchical algorithms, Kohonen networks algorithms. Examples are the use of these algorithms in biomedical research.

  19. Depth data research of GIS based on clustering analysis algorithm

    Science.gov (United States)

    Xiong, Yan; Xu, Wenli

    2018-03-01

    The data of GIS have spatial distribution. Geographic data has both spatial characteristics and attribute characteristics, and also changes with time. Therefore, the amount of data is very large. Nowadays, many industries and departments in the society are using GIS. However, without proper data analysis and mining scheme, GIS will not exert its maximum effectiveness and will waste a lot of data. In this paper, we use the geographic information demand of a national security department as the experimental object, combining the characteristics of GIS data, taking into account the characteristics of time, space, attributes and so on, and using cluster analysis algorithm. We further study the mining scheme for depth data, and get the algorithm model. This algorithm can automatically classify sample data, and then carry out exploratory analysis. The research shows that the algorithm model and the information mining scheme can quickly find hidden depth information from the surface data of GIS, thus improving the efficiency of the security department. This algorithm can also be extended to other fields.

  20. The use of hierarchical clustering for the design of optimized monitoring networks

    Science.gov (United States)

    Soares, Joana; Makar, Paul Andrew; Aklilu, Yayne; Akingunola, Ayodeji

    2018-05-01

    Associativity analysis is a powerful tool to deal with large-scale datasets by clustering the data on the basis of (dis)similarity and can be used to assess the efficacy and design of air quality monitoring networks. We describe here our use of Kolmogorov-Zurbenko filtering and hierarchical clustering of NO2 and SO2 passive and continuous monitoring data to analyse and optimize air quality networks for these species in the province of Alberta, Canada. The methodology applied in this study assesses dissimilarity between monitoring station time series based on two metrics: 1 - R, R being the Pearson correlation coefficient, and the Euclidean distance; we find that both should be used in evaluating monitoring site similarity. We have combined the analytic power of hierarchical clustering with the spatial information provided by deterministic air quality model results, using the gridded time series of model output as potential station locations, as a proxy for assessing monitoring network design and for network optimization. We demonstrate that clustering results depend on the air contaminant analysed, reflecting the difference in the respective emission sources of SO2 and NO2 in the region under study. Our work shows that much of the signal identifying the sources of NO2 and SO2 emissions resides in shorter timescales (hourly to daily) due to short-term variation of concentrations and that longer-term averages in data collection may lose the information needed to identify local sources. However, the methodology identifies stations mainly influenced by seasonality, if larger timescales (weekly to monthly) are considered. We have performed the first dissimilarity analysis based on gridded air quality model output and have shown that the methodology is capable of generating maps of subregions within which a single station will represent the entire subregion, to a given level of dissimilarity. We have also shown that our approach is capable of identifying different

  1. Gravitation field algorithm and its application in gene cluster

    Directory of Open Access Journals (Sweden)

    Zheng Ming

    2010-09-01

    Full Text Available Abstract Background Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA.

  2. Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions

    KAUST Repository

    AbdulJabbar, Mustafa Abdulmajeed

    2017-05-11

    Reduction of communication and efficient partitioning are key issues for achieving scalability in hierarchical N-Body algorithms like Fast Multipole Method (FMM). In the present work, we propose three independent strategies to improve partitioning and reduce communication. First, we show that the conventional wisdom of using space-filling curve partitioning may not work well for boundary integral problems, which constitute a significant portion of FMM’s application user base. We propose an alternative method that modifies orthogonal recursive bisection to relieve the cell-partition misalignment that has kept it from scaling previously. Secondly, we optimize the granularity of communication to find the optimal balance between a bulk-synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale to be faster than the MPI library’s MPI_Alltoallv that is commonly used.

  3. A Clustering Routing Protocol for Mobile Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Jinke Huang

    2016-01-01

    Full Text Available The dynamic topology of a mobile ad hoc network poses a real challenge in the design of hierarchical routing protocol, which combines proactive with reactive routing protocols and takes advantages of both. And as an essential technique of hierarchical routing protocol, clustering of nodes provides an efficient method of establishing a hierarchical structure in mobile ad hoc networks. In this paper, we designed a novel clustering algorithm and a corresponding hierarchical routing protocol for large-scale mobile ad hoc networks. Each cluster is composed of a cluster head, several cluster gateway nodes, several cluster guest nodes, and other cluster members. The proposed routing protocol uses proactive protocol between nodes within individual clusters and reactive protocol between clusters. Simulation results show that the proposed clustering algorithm and hierarchical routing protocol provide superior performance with several advantages over existing clustering algorithm and routing protocol, respectively.

  4. A multiresolution hierarchical classification algorithm for filtering airborne LiDAR data

    Science.gov (United States)

    Chen, Chuanfa; Li, Yanyan; Li, Wei; Dai, Honglei

    2013-08-01

    We presented a multiresolution hierarchical classification (MHC) algorithm for differentiating ground from non-ground LiDAR point cloud based on point residuals from the interpolated raster surface. MHC includes three levels of hierarchy, with the simultaneous increase of cell resolution and residual threshold from the low to the high level of the hierarchy. At each level, the surface is iteratively interpolated towards the ground using thin plate spline (TPS) until no ground points are classified, and the classified ground points are used to update the surface in the next iteration. 15 groups of benchmark dataset, provided by the International Society for Photogrammetry and Remote Sensing (ISPRS) commission, were used to compare the performance of MHC with those of the 17 other publicized filtering methods. Results indicated that MHC with the average total error and average Cohen’s kappa coefficient of 4.11% and 86.27% performs better than all other filtering methods.

  5. Development of Automatic Cluster Algorithm for Microcalcification in Digital Mammography

    International Nuclear Information System (INIS)

    Choi, Seok Yoon; Kim, Chang Soo

    2009-01-01

    Digital Mammography is an efficient imaging technique for the detection and diagnosis of breast pathological disorders. Six mammographic criteria such as number of cluster, number, size, extent and morphologic shape of microcalcification, and presence of mass, were reviewed and correlation with pathologic diagnosis were evaluated. It is very important to find breast cancer early when treatment can reduce deaths from breast cancer and breast incision. In screening breast cancer, mammography is typically used to view the internal organization. Clusterig microcalcifications on mammography represent an important feature of breast mass, especially that of intraductal carcinoma. Because microcalcification has high correlation with breast cancer, a cluster of a microcalcification can be very helpful for the clinical doctor to predict breast cancer. For this study, three steps of quantitative evaluation are proposed : DoG filter, adaptive thresholding, Expectation maximization. Through the proposed algorithm, each cluster in the distribution of microcalcification was able to measure the number calcification and length of cluster also can be used to automatically diagnose breast cancer as indicators of the primary diagnosis.

  6. Biomolecule-Assisted Hydrothermal Synthesis and Self-Assembly of Bi2Te3 Nanostring-Cluster Hierarchical Structure

    DEFF Research Database (Denmark)

    Mi, Jianli; Lock, Nina; Sun, Ting

    2010-01-01

    A simple biomolecule-assisted hydrothermal approach has been developed for the fabrication of Bi2Te3 thermoelectric nanomaterials. The product has a nanostring-cluster hierarchical structure which is composed of ordered and aligned platelet-like crystals. The platelets are100 nm in diameter...

  7. A study of hierarchical clustering of galaxies in an expanding universe

    Science.gov (United States)

    Porter, D. H.

    The nonlinear hierarchical clustering of galaxies in an Einstein-deSitter (Omega = 1), initially white noise mass fluctuations (n = 0) model universe is investigated and shown to be in contradiction with previous results. The model is done in terms of an 11,000-body numerical simulation. The independent statics of 0.72 million particles are used to simulte the boundary conditions. A new method for integrating the Newtonian N-body gravity equations, which has controllable accuracy, incorporates a recursive center of mass reduction, and regularizes two body encounters is used to do the simulation. The coordinate system used here is well suited for the investigation of galaxy clustering, incorporating the independent positions and velocities of an arbitrary number of particles into a logarithmic hierarchy of center of mass nodes. The boundary for the simulation is created by using this hierarchy to map the independent statics of 0.72 million particles into just 4,000 particles. This method for simulating the boundary conditions also has controllable accuracy.

  8. Symptom Clusters in People Living with HIV Attending Five Palliative Care Facilities in Two Sub-Saharan African Countries: A Hierarchical Cluster Analysis.

    Science.gov (United States)

    Moens, Katrien; Siegert, Richard J; Taylor, Steve; Namisango, Eve; Harding, Richard

    2015-01-01

    Symptom research across conditions has historically focused on single symptoms, and the burden of multiple symptoms and their interactions has been relatively neglected especially in people living with HIV. Symptom cluster studies are required to set priorities in treatment planning, and to lessen the total symptom burden. This study aimed to identify and compare symptom clusters among people living with HIV attending five palliative care facilities in two sub-Saharan African countries. Data from cross-sectional self-report of seven-day symptom prevalence on the 32-item Memorial Symptom Assessment Scale-Short Form were used. A hierarchical cluster analysis was conducted using Ward's method applying squared Euclidean Distance as the similarity measure to determine the clusters. Contingency tables, X2 tests and ANOVA were used to compare the clusters by patient specific characteristics and distress scores. Among the sample (N=217) the mean age was 36.5 (SD 9.0), 73.2% were female, and 49.1% were on antiretroviral therapy (ART). The cluster analysis produced five symptom clusters identified as: 1) dermatological; 2) generalised anxiety and elimination; 3) social and image; 4) persistently present; and 5) a gastrointestinal-related symptom cluster. The patients in the first three symptom clusters reported the highest physical and psychological distress scores. Patient characteristics varied significantly across the five clusters by functional status (worst functional physical status in cluster one, ppeople living with HIV with longitudinally collected symptom data to test cluster stability and identify common symptom trajectories is recommended.

  9. Hierarchical clustering of breast cancer methylomes revealed differentially methylated and expressed breast cancer genes.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Oncogenic transformation of normal cells often involves epigenetic alterations, including histone modification and DNA methylation. We conducted whole-genome bisulfite sequencing to determine the DNA methylomes of normal breast, fibroadenoma, invasive ductal carcinomas and MCF7. The emergence, disappearance, expansion and contraction of kilobase-sized hypomethylated regions (HMRs and the hypomethylation of the megabase-sized partially methylated domains (PMDs are the major forms of methylation changes observed in breast tumor samples. Hierarchical clustering of HMR revealed tumor-specific hypermethylated clusters and differential methylated enhancers specific to normal or breast cancer cell lines. Joint analysis of gene expression and DNA methylation data of normal breast and breast cancer cells identified differentially methylated and expressed genes associated with breast and/or ovarian cancers in cancer-specific HMR clusters. Furthermore, aberrant patterns of X-chromosome inactivation (XCI was found in breast cancer cell lines as well as breast tumor samples in the TCGA BRCA (breast invasive carcinoma dataset. They were characterized with differentially hypermethylated XIST promoter, reduced expression of XIST, and over-expression of hypomethylated X-linked genes. High expressions of these genes were significantly associated with lower survival rates in breast cancer patients. Comprehensive analysis of the normal and breast tumor methylomes suggests selective targeting of DNA methylation changes during breast cancer progression. The weak causal relationship between DNA methylation and gene expression observed in this study is evident of more complex role of DNA methylation in the regulation of gene expression in human epigenetics that deserves further investigation.

  10. A Heuristic Task Scheduling Algorithm for Heterogeneous Virtual Clusters

    Directory of Open Access Journals (Sweden)

    Weiwei Lin

    2016-01-01

    Full Text Available Cloud computing provides on-demand computing and storage services with high performance and high scalability. However, the rising energy consumption of cloud data centers has become a prominent problem. In this paper, we first introduce an energy-aware framework for task scheduling in virtual clusters. The framework consists of a task resource requirements prediction module, an energy estimate module, and a scheduler with a task buffer. Secondly, based on this framework, we propose a virtual machine power efficiency-aware greedy scheduling algorithm (VPEGS. As a heuristic algorithm, VPEGS estimates task energy by considering factors including task resource demands, VM power efficiency, and server workload before scheduling tasks in a greedy manner. We simulated a heterogeneous VM cluster and conducted experiment to evaluate the effectiveness of VPEGS. Simulation results show that VPEGS effectively reduced total energy consumption by more than 20% without producing large scheduling overheads. With the similar heuristic ideology, it outperformed Min-Min and RASA with respect to energy saving by about 29% and 28%, respectively.

  11. Ternary alloy material prediction using genetic algorithm and cluster expansion

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Chong [Iowa State Univ., Ames, IA (United States)

    2015-12-01

    This thesis summarizes our study on the crystal structures prediction of Fe-V-Si system using genetic algorithm and cluster expansion. Our goal is to explore and look for new stable compounds. We started from the current ten known experimental phases, and calculated formation energies of those compounds using density functional theory (DFT) package, namely, VASP. The convex hull was generated based on the DFT calculations of the experimental known phases. Then we did random search on some metal rich (Fe and V) compositions and found that the lowest energy structures were body centered cube (bcc) underlying lattice, under which we did our computational systematic searches using genetic algorithm and cluster expansion. Among hundreds of the searched compositions, thirteen were selected and DFT formation energies were obtained by VASP. The stability checking of those thirteen compounds was done in reference to the experimental convex hull. We found that the composition, 24-8-16, i.e., Fe3VSi2 is a new stable phase and it can be very inspiring to the future experiments.

  12. jClustering, an open framework for the development of 4D clustering algorithms.

    Directory of Open Access Journals (Sweden)

    José María Mateos-Pérez

    Full Text Available We present jClustering, an open framework for the design of clustering algorithms in dynamic medical imaging. We developed this tool because of the difficulty involved in manually segmenting dynamic PET images and the lack of availability of source code for published segmentation algorithms. Providing an easily extensible open tool encourages publication of source code to facilitate the process of comparing algorithms and provide interested third parties with the opportunity to review code. The internal structure of the framework allows an external developer to implement new algorithms easily and quickly, focusing only on the particulars of the method being implemented and not on image data handling and preprocessing. This tool has been coded in Java and is presented as an ImageJ plugin in order to take advantage of all the functionalities offered by this imaging analysis platform. Both binary packages and source code have been published, the latter under a free software license (GNU General Public License to allow modification if necessary.

  13. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  14. Hierarchical Cluster Analysis of Semicircular Canal and Otolith Deficits in Bilateral Vestibulopathy

    Directory of Open Access Journals (Sweden)

    Alexander A. Tarnutzer

    2018-04-01

    Full Text Available BackgroundGait imbalance and oscillopsia are frequent complaints of bilateral vestibular loss (BLV. Video-head-impulse testing (vHIT of all six semicircular canals (SCCs has demonstrated varying involvement of the different canals. Sparing of anterior-canal function has been linked to aminoglycoside-related vestibulopathy and Menière’s disease. We hypothesized that utricular and saccular impairment [assessed by vestibular-evoked myogenic potentials (VEMPs] may be disease-specific also, possibly facilitating the differential diagnosis.MethodsWe searched our vHIT database (n = 3,271 for patients with bilaterally impaired SCC function who also received ocular VEMPs (oVEMPs and cervical VEMPs (cVEMPs and identified 101 patients. oVEMP/cVEMP latencies above the 95th percentile and peak-to-peak amplitudes below the 5th percentile of normal were considered abnormal. Frequency of impairment of vestibular end organs (horizontal/anterior/posterior SCC, utriculus/sacculus was analyzed with hierarchical cluster analysis and correlated with the underlying etiology.ResultsRates of utricular and saccular loss of function were similar (87.1 vs. 78.2%, p = 0.136, Fisher’s exact test. oVEMP abnormalities were found more frequent in aminoglycoside-related bilateral vestibular loss (BVL compared with Menière’s disease (91.7 vs. 54.6%, p = 0.039. Hierarchical cluster analysis indicated distinct patterns of vestibular end-organ impairment, showing that the results for the same end-organs on both sides are more similar than to other end-organs. Relative sparing of anterior-canal function was reflected in late merging with the other end-organs, emphasizing their distinct state. An anatomically corresponding pattern of SCC/otolith hypofunction was present in 60.4% (oVEMPs vs. horizontal SCCs, 34.7% (oVEMPs vs. anterior SCCs, and 48.5% (cVEMPs vs. posterior SCCs of cases. Average (±1 SD number of damaged sensors was 6.8 ± 2.2 out of 10

  15. KM-FCM: A fuzzy clustering optimization algorithm based on Mahalanobis distance

    Directory of Open Access Journals (Sweden)

    Zhiwen ZU

    2018-04-01

    Full Text Available The traditional fuzzy clustering algorithm uses Euclidean distance as the similarity criterion, which is disadvantageous to the multidimensional data processing. In order to solve this situation, Mahalanobis distance is used instead of the traditional Euclidean distance, and the optimization of fuzzy clustering algorithm based on Mahalanobis distance is studied to enhance the clustering effect and ability. With making the initialization means by Heuristic search algorithm combined with k-means algorithm, and in terms of the validity function which could automatically adjust the optimal clustering number, an optimization algorithm KM-FCM is proposed. The new algorithm is compared with FCM algorithm, FCM-M algorithm and M-FCM algorithm in three standard data sets. The experimental results show that the KM-FCM algorithm is effective. It has higher clustering accuracy than FCM, FCM-M and M-FCM, recognizing high-dimensional data clustering well. It has global optimization effect, and the clustering number has no need for setting in advance. The new algorithm provides a reference for the optimization of fuzzy clustering algorithm based on Mahalanobis distance.

  16. Using internal evaluation measures to validate the quality of diverse stream clustering algorithms

    NARCIS (Netherlands)

    Hassani, M.; Seidl, T.

    2017-01-01

    Measuring the quality of a clustering algorithm has shown to be as important as the algorithm itself. It is a crucial part of choosing the clustering algorithm that performs best for an input data. Streaming input data have many features that make them much more challenging than static ones. They

  17. Validating hierarchical verbal autopsy expert algorithms in a large data set with known causes of death.

    Science.gov (United States)

    Kalter, Henry D; Perin, Jamie; Black, Robert E

    2016-06-01

    Physician assessment historically has been the most common method of analyzing verbal autopsy (VA) data. Recently, the World Health Organization endorsed two automated methods, Tariff 2.0 and InterVA-4, which promise greater objectivity and lower cost. A disadvantage of the Tariff method is that it requires a training data set from a prior validation study, while InterVA relies on clinically specified conditional probabilities. We undertook to validate the hierarchical expert algorithm analysis of VA data, an automated, intuitive, deterministic method that does not require a training data set. Using Population Health Metrics Research Consortium study hospital source data, we compared the primary causes of 1629 neonatal and 1456 1-59 month-old child deaths from VA expert algorithms arranged in a hierarchy to their reference standard causes. The expert algorithms were held constant, while five prior and one new "compromise" neonatal hierarchy, and three former child hierarchies were tested. For each comparison, the reference standard data were resampled 1000 times within the range of cause-specific mortality fractions (CSMF) for one of three approximated community scenarios in the 2013 WHO global causes of death, plus one random mortality cause proportions scenario. We utilized CSMF accuracy to assess overall population-level validity, and the absolute difference between VA and reference standard CSMFs to examine particular causes. Chance-corrected concordance (CCC) and Cohen's kappa were used to evaluate individual-level cause assignment. Overall CSMF accuracy for the best-performing expert algorithm hierarchy was 0.80 (range 0.57-0.96) for neonatal deaths and 0.76 (0.50-0.97) for child deaths. Performance for particular causes of death varied, with fairly flat estimated CSMF over a range of reference values for several causes. Performance at the individual diagnosis level was also less favorable than that for overall CSMF (neonatal: best CCC = 0.23, range 0

  18. The application of mixed recommendation algorithm with user clustering in the microblog advertisements promotion

    Science.gov (United States)

    Gong, Lina; Xu, Tao; Zhang, Wei; Li, Xuhong; Wang, Xia; Pan, Wenwen

    2017-03-01

    The traditional microblog recommendation algorithm has the problems of low efficiency and modest effect in the era of big data. In the aim of solving these issues, this paper proposed a mixed recommendation algorithm with user clustering. This paper first introduced the situation of microblog marketing industry. Then, this paper elaborates the user interest modeling process and detailed advertisement recommendation methods. Finally, this paper compared the mixed recommendation algorithm with the traditional classification algorithm and mixed recommendation algorithm without user clustering. The results show that the mixed recommendation algorithm with user clustering has good accuracy and recall rate in the microblog advertisements promotion.

  19. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi.

    Directory of Open Access Journals (Sweden)

    Diane G O Saunders

    Full Text Available Rust fungi are obligate biotrophic pathogens that cause considerable damage on crop plants. Puccinia graminis f. sp. tritici, the causal agent of wheat stem rust, and Melampsora larici-populina, the poplar leaf rust pathogen, have strong deleterious impacts on wheat and poplar wood production, respectively. Filamentous pathogens such as rust fungi secrete molecules called disease effectors that act as modulators of host cell physiology and can suppress or trigger host immunity. Current knowledge on effectors from other filamentous plant pathogens can be exploited for the characterisation of effectors in the genome of recently sequenced rust fungi. We designed a comprehensive in silico analysis pipeline to identify the putative effector repertoire from the genome of two plant pathogenic rust fungi. The pipeline is based on the observation that known effector proteins from filamentous pathogens have at least one of the following properties: (i contain a secretion signal, (ii are encoded by in planta induced genes, (iii have similarity to haustorial proteins, (iv are small and cysteine rich, (v contain a known effector motif or a nuclear localization signal, (vi are encoded by genes with long intergenic regions, (vii contain internal repeats, and (viii do not contain PFAM domains, except those associated with pathogenicity. We used Markov clustering and hierarchical clustering to classify protein families of rust pathogens and rank them according to their likelihood of being effectors. Using this approach, we identified eight families of candidate effectors that we consider of high value for functional characterization. This study revealed a diverse set of candidate effectors, including families of haustorial expressed secreted proteins and small cysteine-rich proteins. This comprehensive classification of candidate effectors from these devastating rust pathogens is an initial step towards probing plant germplasm for novel resistance components.

  20. Ant Colony Clustering Algorithm and Improved Markov Random Fusion Algorithm in Image Segmentation of Brain Images

    Directory of Open Access Journals (Sweden)

    Guohua Zou

    2016-12-01

    Full Text Available New medical imaging technology, such as Computed Tomography and Magnetic Resonance Imaging (MRI, has been widely used in all aspects of medical diagnosis. The purpose of these imaging techniques is to obtain various qualitative and quantitative data of the patient comprehensively and accurately, and provide correct digital information for diagnosis, treatment planning and evaluation after surgery. MR has a good imaging diagnostic advantage for brain diseases. However, as the requirements of the brain image definition and quantitative analysis are always increasing, it is necessary to have better segmentation of MR brain images. The FCM (Fuzzy C-means algorithm is widely applied in image segmentation, but it has some shortcomings, such as long computation time and poor anti-noise capability. In this paper, firstly, the Ant Colony algorithm is used to determine the cluster centers and the number of FCM algorithm so as to improve its running speed. Then an improved Markov random field model is used to improve the algorithm, so that its antinoise ability can be improved. Experimental results show that the algorithm put forward in this paper has obvious advantages in image segmentation speed and segmentation effect.

  1. Application of hierarchical clustering method to classify of space-time rainfall patterns

    Science.gov (United States)

    Yu, Hwa-Lung; Chang, Tu-Je

    2010-05-01

    Understanding the local precipitation patterns is essential to the water resources management and flooding mitigation. The precipitation patterns can vary in space and time depending upon the factors from different spatial scales such as local topological changes and macroscopic atmospheric circulation. The spatiotemporal variation of precipitation in Taiwan is significant due to its complex terrain and its location at west pacific and subtropical area, where is the boundary between the pacific ocean and Asia continent with the complex interactions among the climatic processes. This study characterizes local-scale precipitation patterns by classifying the historical space-time precipitation records. We applied the hierarchical ascending clustering method to analyze the precipitation records from 1960 to 2008 at the six rainfall stations located in Lan-yang catchment at the northeast of the island. Our results identify the four primary space-time precipitation types which may result from distinct driving forces from the changes of atmospheric variables and topology at different space-time scales. This study also presents an important application of the statistical downscaling to combine large-scale upper-air circulation with local space-time precipitation patterns.

  2. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  3. Clustering for Binary Data Sets by Using Genetic Algorithm-Incremental K-means

    Science.gov (United States)

    Saharan, S.; Baragona, R.; Nor, M. E.; Salleh, R. M.; Asrah, N. M.

    2018-04-01

    This research was initially driven by the lack of clustering algorithms that specifically focus in binary data. To overcome this gap in knowledge, a promising technique for analysing this type of data became the main subject in this research, namely Genetic Algorithms (GA). For the purpose of this research, GA was combined with the Incremental K-means (IKM) algorithm to cluster the binary data streams. In GAIKM, the objective function was based on a few sufficient statistics that may be easily and quickly calculated on binary numbers. The implementation of IKM will give an advantage in terms of fast convergence. The results show that GAIKM is an efficient and effective new clustering algorithm compared to the clustering algorithms and to the IKM itself. In conclusion, the GAIKM outperformed other clustering algorithms such as GCUK, IKM, Scalable K-means (SKM) and K-means clustering and paves the way for future research involving missing data and outliers.

  4. Robust K-Median and K-Means Clustering Algorithms for Incomplete Data

    Directory of Open Access Journals (Sweden)

    Jinhua Li

    2016-01-01

    Full Text Available Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

  5. Hierarchical modularization of biochemical pathways using fuzzy-c means clustering.

    Science.gov (United States)

    de Luis Balaguer, Maria A; Williams, Cranos M

    2014-08-01

    Biological systems that are representative of regulatory, metabolic, or signaling pathways can be highly complex. Mathematical models that describe such systems inherit this complexity. As a result, these models can often fail to provide a path toward the intuitive comprehension of these systems. More coarse information that allows a perceptive insight of the system is sometimes needed in combination with the model to understand control hierarchies or lower level functional relationships. In this paper, we present a method to identify relationships between components of dynamic models of biochemical pathways that reside in different functional groups. We find primary relationships and secondary relationships. The secondary relationships reveal connections that are present in the system, which current techniques that only identify primary relationships are unable to show. We also identify how relationships between components dynamically change over time. This results in a method that provides the hierarchy of the relationships among components, which can help us to understand the low level functional structure of the system and to elucidate potential hierarchical control. As a proof of concept, we apply the algorithm to the epidermal growth factor signal transduction pathway, and to the C3 photosynthesis pathway. We identify primary relationships among components that are in agreement with previous computational decomposition studies, and identify secondary relationships that uncover connections among components that current computational approaches were unable to reveal.

  6. Hierarchical cluster analysis of technical replicates to identify interferents in untargeted mass spectrometry metabolomics.

    Science.gov (United States)

    Caesar, Lindsay K; Kvalheim, Olav M; Cech, Nadja B

    2018-08-27

    Mass spectral data sets often contain experimental artefacts, and data filtering prior to statistical analysis is crucial to extract reliable information. This is particularly true in untargeted metabolomics analyses, where the analyte(s) of interest are not known a priori. It is often assumed that chemical interferents (i.e. solvent contaminants such as plasticizers) are consistent across samples, and can be removed by background subtraction from blank injections. On the contrary, it is shown here that chemical contaminants may vary in abundance across each injection, potentially leading to their misidentification as relevant sample components. With this metabolomics study, we demonstrate the effectiveness of hierarchical cluster analysis (HCA) of replicate injections (technical replicates) as a methodology to identify chemical interferents and reduce their contaminating contribution to metabolomics models. Pools of metabolites with varying complexity were prepared from the botanical Angelica keiskei Koidzumi and spiked with known metabolites. Each set of pools was analyzed in triplicate and at multiple concentrations using ultraperformance liquid chromatography coupled to mass spectrometry (UPLC-MS). Before filtering, HCA failed to cluster replicates in the data sets. To identify contaminant peaks, we developed a filtering process that evaluated the relative peak area variance of each variable within triplicate injections. These interferent peaks were found across all samples, but did not show consistent peak area from injection to injection, even when evaluating the same chemical sample. This filtering process identified 128 ions that appear to originate from the UPLC-MS system. Data sets collected for a high number of pools with comparatively simple chemical composition were highly influenced by these chemical interferents, as were samples that were analyzed at a low concentration. When chemical interferent masses were removed, technical replicates clustered in

  7. BULGELESS GIANT GALAXIES CHALLENGE OUR PICTURE OF GALAXY FORMATION BY HIERARCHICAL CLUSTERING ,

    International Nuclear Information System (INIS)

    Kormendy, John; Cornell, Mark E.; Drory, Niv; Bender, Ralf

    2010-01-01

    To better understand the prevalence of bulgeless galaxies in the nearby field, we dissect giant Sc-Scd galaxies with Hubble Space Telescope (HST) photometry and Hobby-Eberly Telescope (HET) spectroscopy. We use the HET High Resolution Spectrograph (resolution R ≡ λ/FWHM ≅ 15, 000) to measure stellar velocity dispersions in the nuclear star clusters and (pseudo)bulges of the pure-disk galaxies M 33, M 101, NGC 3338, NGC 3810, NGC 6503, and NGC 6946. The dispersions range from 20 ± 1 km s -1 in the nucleus of M 33 to 78 ± 2 km s -1 in the pseudobulge of NGC 3338. We use HST archive images to measure the brightness profiles of the nuclei and (pseudo)bulges in M 101, NGC 6503, and NGC 6946 and hence to estimate their masses. The results imply small mass-to-light ratios consistent with young stellar populations. These observations lead to two conclusions. (1) Upper limits on the masses of any supermassive black holes are M . ∼ 6 M sun in M 101 and M . ∼ 6 M sun in NGC 6503. (2) We show that the above galaxies contain only tiny pseudobulges that make up ∼ circ > 150 km s -1 , including M 101, NGC 6946, IC 342, and our Galaxy, show no evidence for a classical bulge. Four may contain small classical bulges that contribute 5%-12% of the light of the galaxy. Only four of the 19 giant galaxies are ellipticals or have classical bulges that contribute ∼1/3 of the galaxy light. We conclude that pure-disk galaxies are far from rare. It is hard to understand how bulgeless galaxies could form as the quiescent tail of a distribution of merger histories. Recognition of pseudobulges makes the biggest problem with cold dark matter galaxy formation more acute: How can hierarchical clustering make so many giant, pure-disk galaxies with no evidence for merger-built bulges? Finally, we emphasize that this problem is a strong function of environment: the Virgo cluster is not a puzzle, because more than 2/3 of its stellar mass is in merger remnants.

  8. HPEPDOCK: a web server for blind peptide-protein docking based on a hierarchical algorithm.

    Science.gov (United States)

    Zhou, Pei; Jin, Bowen; Li, Hao; Huang, Sheng-You

    2018-05-09

    Protein-peptide interactions are crucial in many cellular functions. Therefore, determining the structure of protein-peptide complexes is important for understanding the molecular mechanism of related biological processes and developing peptide drugs. HPEPDOCK is a novel web server for blind protein-peptide docking through a hierarchical algorithm. Instead of running lengthy simulations to refine peptide conformations, HPEPDOCK considers the peptide flexibility through an ensemble of peptide conformations generated by our MODPEP program. For blind global peptide docking, HPEPDOCK obtained a success rate of 33.3% in binding mode prediction on a benchmark of 57 unbound cases when the top 10 models were considered, compared to 21.1% for pepATTRACT server. HPEPDOCK also performed well in docking against homology models and obtained a success rate of 29.8% within top 10 predictions. For local peptide docking, HPEPDOCK achieved a high success rate of 72.6% on a benchmark of 62 unbound cases within top 10 predictions, compared to 45.2% for HADDOCK peptide protocol. Our HPEPDOCK server is computationally efficient and consumed an average of 29.8 mins for a global peptide docking job and 14.2 mins for a local peptide docking job. The HPEPDOCK web server is available at http://huanglab.phys.hust.edu.cn/hpepdock/.

  9. Automatic Curve Fitting Based on Radial Basis Functions and a Hierarchical Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    G. Trejo-Caballero

    2015-01-01

    Full Text Available Curve fitting is a very challenging problem that arises in a wide variety of scientific and engineering applications. Given a set of data points, possibly noisy, the goal is to build a compact representation of the curve that corresponds to the best estimate of the unknown underlying relationship between two variables. Despite the large number of methods available to tackle this problem, it remains challenging and elusive. In this paper, a new method to tackle such problem using strictly a linear combination of radial basis functions (RBFs is proposed. To be more specific, we divide the parameter search space into linear and nonlinear parameter subspaces. We use a hierarchical genetic algorithm (HGA to minimize a model selection criterion, which allows us to automatically and simultaneously determine the nonlinear parameters and then, by the least-squares method through Singular Value Decomposition method, to compute the linear parameters. The method is fully automatic and does not require subjective parameters, for example, smooth factor or centre locations, to perform the solution. In order to validate the efficacy of our approach, we perform an experimental study with several tests on benchmarks smooth functions. A comparative analysis with two successful methods based on RBF networks has been included.

  10. Characterizing the course of back pain after osteoporotic vertebral fracture: a hierarchical cluster analysis of a prospective cohort study.

    Science.gov (United States)

    Toyoda, Hiromitsu; Takahashi, Shinji; Hoshino, Masatoshi; Takayama, Kazushi; Iseki, Kazumichi; Sasaoka, Ryuichi; Tsujio, Tadao; Yasuda, Hiroyuki; Sasaki, Takeharu; Kanematsu, Fumiaki; Kono, Hiroshi; Nakamura, Hiroaki

    2017-09-23

    This study demonstrated four distinct patterns in the course of back pain after osteoporotic vertebral fracture (OVF). Greater angular instability in the first 6 months after the baseline was one factor affecting back pain after OVF. Understanding the natural course of symptomatic acute OVF is important in deciding the optimal treatment strategy. We used latent class analysis to classify the course of back pain after OVF and identify the risk factors associated with persistent pain. This multicenter cohort study included 218 consecutive patients with ≤ 2-week-old OVFs who were enrolled at 11 institutions. Dynamic x-rays and back pain assessment with a visual analog scale (VAS) were obtained at enrollment and at 1-, 3-, and 6-month follow-ups. The VAS scores were used to characterize patient groups, using hierarchical cluster analysis. VAS for 128 patients was used for hierarchical cluster analysis. Analysis yielded four clusters representing different patterns of back pain progression. Cluster 1 patients (50.8%) had stable, mild pain. Cluster 2 patients (21.1%) started with moderate pain and progressed quickly to very low pain. Patients in cluster 3 (10.9%) had moderate pain that initially improved but worsened after 3 months. Cluster 4 patients (17.2%) had persistent severe pain. Patients in cluster 4 showed significant high baseline pain intensity, higher degree of angular instability, and higher number of previous OVFs, and tended to lack regular exercise. In contrast, patients in cluster 2 had significantly lower baseline VAS and less angular instability. We identified four distinct groups of OVF patients with different patterns of back pain progression. Understanding the course of back pain after OVF may help in its management and contribute to future treatment trials.

  11. 3D NEAREST NEIGHBOUR SEARCH USING A CLUSTERED HIERARCHICAL TREE STRUCTURE

    Directory of Open Access Journals (Sweden)

    A. Suhaibah

    2016-06-01

    Full Text Available Locating and analysing the location of new stores or outlets is one of the common issues facing retailers and franchisers. This is due to assure that new opening stores are at their strategic location to attract the highest possible number of customers. Spatial information is used to manage, maintain and analyse these store locations. However, since the business of franchising and chain stores in urban areas runs within high rise multi-level buildings, a three-dimensional (3D method is prominently required in order to locate and identify the surrounding information such as at which level of the franchise unit will be located or is the franchise unit located is at the best level for visibility purposes. One of the common used analyses used for retrieving the surrounding information is Nearest Neighbour (NN analysis. It uses a point location and identifies the surrounding neighbours. However, with the immense number of urban datasets, the retrieval and analysis of nearest neighbour information and their efficiency will become more complex and crucial. In this paper, we present a technique to retrieve nearest neighbour information in 3D space using a clustered hierarchical tree structure. Based on our findings, the proposed approach substantially showed an improvement of response time analysis compared to existing approaches of spatial access methods in databases. The query performance was tested using a dataset consisting of 500,000 point locations building and franchising unit. The results are presented in this paper. Another advantage of this structure is that it also offers a minimal overlap and coverage among nodes which can reduce repetitive data entry.

  12. Clustering and Bayesian hierarchical modeling for the definition of informative prior distributions in hydrogeology

    Science.gov (United States)

    Cucchi, K.; Kawa, N.; Hesse, F.; Rubin, Y.

    2017-12-01

    In order to reduce uncertainty in the prediction of subsurface flow and transport processes, practitioners should use all data available. However, classic inverse modeling frameworks typically only make use of information contained in in-situ field measurements to provide estimates of hydrogeological parameters. Such hydrogeological information about an aquifer is difficult and costly to acquire. In this data-scarce context, the transfer of ex-situ information coming from previously investigated sites can be critical for improving predictions by better constraining the estimation procedure. Bayesian inverse modeling provides a coherent framework to represent such ex-situ information by virtue of the prior distribution and combine them with in-situ information from the target site. In this study, we present an innovative data-driven approach for defining such informative priors for hydrogeological parameters at the target site. Our approach consists in two steps, both relying on statistical and machine learning methods. The first step is data selection; it consists in selecting sites similar to the target site. We use clustering methods for selecting similar sites based on observable hydrogeological features. The second step is data assimilation; it consists in assimilating data from the selected similar sites into the informative prior. We use a Bayesian hierarchical model to account for inter-site variability and to allow for the assimilation of multiple types of site-specific data. We present the application and validation of the presented methods on an established database of hydrogeological parameters. Data and methods are implemented in the form of an open-source R-package and therefore facilitate easy use by other practitioners.

  13. Hierarchical multiple binary image encryption based on a chaos and phase retrieval algorithm in the Fresnel domain

    International Nuclear Information System (INIS)

    Wang, Zhipeng; Hou, Chenxia; Lv, Xiaodong; Wang, Hongjuan; Gong, Qiong; Qin, Yi

    2016-01-01

    Based on the chaos and phase retrieval algorithm, a hierarchical multiple binary image encryption is proposed. In the encryption process, each plaintext is encrypted into a diffraction intensity pattern by two chaos-generated random phase masks (RPMs). Thereafter, the captured diffraction intensity patterns are partially selected by different binary masks and then combined together to form a single intensity pattern. The combined intensity pattern is saved as ciphertext. For decryption, an iterative phase retrieval algorithm is performed, in which a support constraint in the output plane and a median filtering operation are utilized to achieve a rapid convergence rate without a stagnation problem. The proposed scheme has a simple optical setup and large encryption capacity. In particular, it is well suited for constructing a hierarchical security system. The security and robustness of the proposal are also investigated. (letter)

  14. Higher-spin cluster algorithms: the Heisenberg spin and U(1) quantum link models

    Energy Technology Data Exchange (ETDEWEB)

    Chudnovsky, V

    2000-03-01

    I discuss here how the highly-efficient spin-1/2 cluster algorithm for the Heisenberg antiferromagnet may be extended to higher-dimensional representations; some numerical results are provided. The same extensions can be used for the U(1) flux cluster algorithm, but have not yielded signals of the desired Coulomb phase of the system.

  15. Higher-spin cluster algorithms: the Heisenberg spin and U(1) quantum link models

    International Nuclear Information System (INIS)

    Chudnovsky, V.

    2000-01-01

    I discuss here how the highly-efficient spin-1/2 cluster algorithm for the Heisenberg antiferromagnet may be extended to higher-dimensional representations; some numerical results are provided. The same extensions can be used for the U(1) flux cluster algorithm, but have not yielded signals of the desired Coulomb phase of the system

  16. A Resting-State Brain Functional Network Study in MDD Based on Minimum Spanning Tree Analysis and the Hierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Xiaowei Li

    2017-01-01

    Full Text Available A large number of studies demonstrated that major depressive disorder (MDD is characterized by the alterations in brain functional connections which is also identifiable during the brain’s “resting-state.” But, in the present study, the approach of constructing functional connectivity is often biased by the choice of the threshold. Besides, more attention was paid to the number and length of links in brain networks, and the clustering partitioning of nodes was unclear. Therefore, minimum spanning tree (MST analysis and the hierarchical clustering were first used for the depression disease in this study. Resting-state electroencephalogram (EEG sources were assessed from 15 healthy and 23 major depressive subjects. Then the coherence, MST, and the hierarchical clustering were obtained. In the theta band, coherence analysis showed that the EEG coherence of the MDD patients was significantly higher than that of the healthy controls especially in the left temporal region. The MST results indicated the higher leaf fraction in the depressed group. Compared with the normal group, the major depressive patients lost clustering in frontal regions. Our findings suggested that there was a stronger brain interaction in the MDD group and a left-right functional imbalance in the frontal regions for MDD controls.

  17. Study of parameters of the nearest neighbour shared algorithm on clustering documents

    Science.gov (United States)

    Mustika Rukmi, Alvida; Budi Utomo, Daryono; Imro’atus Sholikhah, Neni

    2018-03-01

    Document clustering is one way of automatically managing documents, extracting of document topics and fastly filtering information. Preprocess of clustering documents processed by textmining consists of: keyword extraction using Rapid Automatic Keyphrase Extraction (RAKE) and making the document as concept vector using Latent Semantic Analysis (LSA). Furthermore, the clustering process is done so that the documents with the similarity of the topic are in the same cluster, based on the preprocesing by textmining performed. Shared Nearest Neighbour (SNN) algorithm is a clustering method based on the number of "nearest neighbors" shared. The parameters in the SNN Algorithm consist of: k nearest neighbor documents, ɛ shared nearest neighbor documents and MinT minimum number of similar documents, which can form a cluster. Characteristics The SNN algorithm is based on shared ‘neighbor’ properties. Each cluster is formed by keywords that are shared by the documents. SNN algorithm allows a cluster can be built more than one keyword, if the value of the frequency of appearing keywords in document is also high. Determination of parameter values on SNN algorithm affects document clustering results. The higher parameter value k, will increase the number of neighbor documents from each document, cause similarity of neighboring documents are lower. The accuracy of each cluster is also low. The higher parameter value ε, caused each document catch only neighbor documents that have a high similarity to build a cluster. It also causes more unclassified documents (noise). The higher the MinT parameter value cause the number of clusters will decrease, since the number of similar documents can not form clusters if less than MinT. Parameter in the SNN Algorithm determine performance of clustering result and the amount of noise (unclustered documents ). The Silhouette coeffisient shows almost the same result in many experiments, above 0.9, which means that SNN algorithm works well

  18. A heart disease recognition embedded system with fuzzy cluster algorithm.

    Science.gov (United States)

    de Carvalho, Helton Hugo; Moreno, Robson Luiz; Pimenta, Tales Cleber; Crepaldi, Paulo C; Cintra, Evaldo

    2013-06-01

    This article presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram - ECG signal processing by reducing the amount of data samples, without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicated common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database (EDB) as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan(®)-3A FPGA. The field programmable gate array (FPGA) implemented a Xilinx Microblaze(®) Soft-Core Processor running at a 50MHz clock rate. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  19. Combinatorial Clustering Algorithm of Quantum-Behaved Particle Swarm Optimization and Cloud Model

    Directory of Open Access Journals (Sweden)

    Mi-Yuan Shan

    2013-01-01

    Full Text Available We propose a combinatorial clustering algorithm of cloud model and quantum-behaved particle swarm optimization (COCQPSO to solve the stochastic problem. The algorithm employs a novel probability model as well as a permutation-based local search method. We are setting the parameters of COCQPSO based on the design of experiment. In the comprehensive computational study, we scrutinize the performance of COCQPSO on a set of widely used benchmark instances. By benchmarking combinatorial clustering algorithm with state-of-the-art algorithms, we can show that its performance compares very favorably. The fuzzy combinatorial optimization algorithm of cloud model and quantum-behaved particle swarm optimization (FCOCQPSO in vague sets (IVSs is more expressive than the other fuzzy sets. Finally, numerical examples show the clustering effectiveness of COCQPSO and FCOCQPSO clustering algorithms which are extremely remarkable.

  20. A Self-Adaptive Fuzzy c-Means Algorithm for Determining the Optimal Number of Clusters

    Science.gov (United States)

    Wang, Zhihao; Yi, Jing

    2016-01-01

    For the shortcoming of fuzzy c-means algorithm (FCM) needing to know the number of clusters in advance, this paper proposed a new self-adaptive method to determine the optimal number of clusters. Firstly, a density-based algorithm was put forward. The algorithm, according to the characteristics of the dataset, automatically determined the possible maximum number of clusters instead of using the empirical rule n and obtained the optimal initial cluster centroids, improving the limitation of FCM that randomly selected cluster centroids lead the convergence result to the local minimum. Secondly, this paper, by introducing a penalty function, proposed a new fuzzy clustering validity index based on fuzzy compactness and separation, which ensured that when the number of clusters verged on that of objects in the dataset, the value of clustering validity index did not monotonically decrease and was close to zero, so that the optimal number of clusters lost robustness and decision function. Then, based on these studies, a self-adaptive FCM algorithm was put forward to estimate the optimal number of clusters by the iterative trial-and-error process. At last, experiments were done on the UCI, KDD Cup 1999, and synthetic datasets, which showed that the method not only effectively determined the optimal number of clusters, but also reduced the iteration of FCM with the stable clustering result. PMID:28042291

  1. The global kernel k-means algorithm for clustering in feature space.

    Science.gov (United States)

    Tzortzis, Grigorios F; Likas, Aristidis C

    2009-07-01

    Kernel k-means is an extension of the standard k -means clustering algorithm that identifies nonlinearly separable clusters. In order to overcome the cluster initialization problem associated with this method, we propose the global kernel k-means algorithm, a deterministic and incremental approach to kernel-based clustering. Our method adds one cluster at each stage, through a global search procedure consisting of several executions of kernel k-means from suitable initializations. This algorithm does not depend on cluster initialization, identifies nonlinearly separable clusters, and, due to its incremental nature and search procedure, locates near-optimal solutions avoiding poor local minima. Furthermore, two modifications are developed to reduce the computational cost that do not significantly affect the solution quality. The proposed methods are extended to handle weighted data points, which enables their application to graph partitioning. We experiment with several data sets and the proposed approach compares favorably to kernel k -means with random restarts.

  2. A fast readout algorithm for Cluster Counting/Timing drift chambers on a FPGA board

    Energy Technology Data Exchange (ETDEWEB)

    Cappelli, L. [Università di Cassino e del Lazio Meridionale (Italy); Creti, P.; Grancagnolo, F. [Istituto Nazionale di Fisica Nucleare, Lecce (Italy); Pepino, A., E-mail: Aurora.Pepino@le.infn.it [Istituto Nazionale di Fisica Nucleare, Lecce (Italy); Tassielli, G. [Istituto Nazionale di Fisica Nucleare, Lecce (Italy); Fermilab, Batavia, IL (United States); Università Marconi, Roma (Italy)

    2013-08-01

    A fast readout algorithm for Cluster Counting and Timing purposes has been implemented and tested on a Virtex 6 core FPGA board. The algorithm analyses and stores data coming from a Helium based drift tube instrumented by 1 GSPS fADC and represents the outcome of balancing between cluster identification efficiency and high speed performance. The algorithm can be implemented in electronics boards serving multiple fADC channels as an online preprocessing stage for drift chamber signals.

  3. Hybrid Tracking Algorithm Improvements and Cluster Analysis Methods.

    Science.gov (United States)

    1982-02-26

    UPGMA ), and Ward’s method. Ling’s papers describe a (k,r) clustering method. Each of these methods have individual characteristics which make them...Reference 7), UPGMA is probably the most frequently used clustering strategy. UPGMA tries to group new points into an existing cluster by using an

  4. Genetic algorithm based two-mode clustering of metabolomics data

    NARCIS (Netherlands)

    Hageman, J.A.; van den Berg, R.A.; Westerhuis, J.A.; van der Werf, M.J.; Smilde, A.K.

    2008-01-01

    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering

  5. A Tabu Search Algorithm for application placement in computer clustering

    NARCIS (Netherlands)

    van der Gaast, Jelmer; Rietveld, Cornelieus A.; Gabor, Adriana; Zhang, Yingqian

    2014-01-01

    This paper presents and analyzes a model for the problem of placing applications on computer clusters (APP). In this problem, organizations requesting a set of software applications have to be assigned to computer clusters such that the costs of opening clusters and installing the necessary

  6. Clustering performance comparison using K-means and expectation maximization algorithms.

    Science.gov (United States)

    Jung, Yong Gyu; Kang, Min Soo; Heo, Jun

    2014-11-14

    Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.

  7. A Trajectory Regression Clustering Technique Combining a Novel Fuzzy C-Means Clustering Algorithm with the Least Squares Method

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2018-04-01

    Full Text Available Rapidly growing GPS (Global Positioning System trajectories hide much valuable information, such as city road planning, urban travel demand, and population migration. In order to mine the hidden information and to capture better clustering results, a trajectory regression clustering method (an unsupervised trajectory clustering method is proposed to reduce local information loss of the trajectory and to avoid getting stuck in the local optimum. Using this method, we first define our new concept of trajectory clustering and construct a novel partitioning (angle-based partitioning method of line segments; second, the Lagrange-based method and Hausdorff-based K-means++ are integrated in fuzzy C-means (FCM clustering, which are used to maintain the stability and the robustness of the clustering process; finally, least squares regression model is employed to achieve regression clustering of the trajectory. In our experiment, the performance and effectiveness of our method is validated against real-world taxi GPS data. When comparing our clustering algorithm with the partition-based clustering algorithms (K-means, K-median, and FCM, our experimental results demonstrate that the presented method is more effective and generates a more reasonable trajectory.

  8. A novel artificial immune algorithm for spatial clustering with obstacle constraint and its applications.

    Science.gov (United States)

    Sun, Liping; Luo, Yonglong; Ding, Xintao; Zhang, Ji

    2014-01-01

    An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE) algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.

  9. A Novel Artificial Immune Algorithm for Spatial Clustering with Obstacle Constraint and Its Applications

    Directory of Open Access Journals (Sweden)

    Liping Sun

    2014-01-01

    Full Text Available An important component of a spatial clustering algorithm is the distance measure between sample points in object space. In this paper, the traditional Euclidean distance measure is replaced with innovative obstacle distance measure for spatial clustering under obstacle constraints. Firstly, we present a path searching algorithm to approximate the obstacle distance between two points for dealing with obstacles and facilitators. Taking obstacle distance as similarity metric, we subsequently propose the artificial immune clustering with obstacle entity (AICOE algorithm for clustering spatial point data in the presence of obstacles and facilitators. Finally, the paper presents a comparative analysis of AICOE algorithm and the classical clustering algorithms. Our clustering model based on artificial immune system is also applied to the case of public facility location problem in order to establish the practical applicability of our approach. By using the clone selection principle and updating the cluster centers based on the elite antibodies, the AICOE algorithm is able to achieve the global optimum and better clustering effect.

  10. A Scheduling Algorithm for Minimizing the Packet Error Probability in Clusterized TDMA Networks

    Directory of Open Access Journals (Sweden)

    Arash T. Toyserkani

    2009-01-01

    Full Text Available We consider clustered wireless networks, where transceivers in a cluster use a time-slotted mechanism (TDMA to access a wireless channel that is shared among several clusters. An approximate expression for the packet-loss probability is derived for networks with one or more mutually interfering clusters in Rayleigh fading environments, and the approximation is shown to be good for relevant scenarios. We then present a scheduling algorithm, based on Lagrangian duality, that exploits the derived packet-loss model in an attempt to minimize the average packet-loss probability in the network. Computer simulations of the proposed scheduling algorithm show that a significant increase in network throughput can be achieved compared to uncoordinated scheduling. Empirical trials also indicate that the proposed optimization algorithm almost always converges to an optimal schedule with a reasonable number of iterations. Thus, the proposed algorithm can also be used for bench-marking suboptimal scheduling algorithms.

  11. Hierarchical Sets: Analyzing Pangenome Structure through Scalable Set Visualizations

    DEFF Research Database (Denmark)

    Pedersen, Thomas Lin

    2017-01-01

    of hierarchical sets by applying it to a pangenome based on 113 Escherichia and Shigella genomes and find it provides a powerful addition to pangenome analysis. The described clustering algorithm and visualizations are implemented in the hierarchicalSets R package available from CRAN (https...

  12. An improved initialization center k-means clustering algorithm based on distance and density

    Science.gov (United States)

    Duan, Yanling; Liu, Qun; Xia, Shuyin

    2018-04-01

    Aiming at the problem of the random initial clustering center of k means algorithm that the clustering results are influenced by outlier data sample and are unstable in multiple clustering, a method of central point initialization method based on larger distance and higher density is proposed. The reciprocal of the weighted average of distance is used to represent the sample density, and the data sample with the larger distance and the higher density are selected as the initial clustering centers to optimize the clustering results. Then, a clustering evaluation method based on distance and density is designed to verify the feasibility of the algorithm and the practicality, the experimental results on UCI data sets show that the algorithm has a certain stability and practicality.

  13. A highly efficient multi-core algorithm for clustering extremely large datasets

    Directory of Open Access Journals (Sweden)

    Kraus Johann M

    2010-04-01

    Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.

  14. An enhanced deterministic K-Means clustering algorithm for cancer subtype prediction from gene expression data.

    Science.gov (United States)

    Nidheesh, N; Abdul Nazeer, K A; Ameer, P M

    2017-12-01

    Clustering algorithms with steps involving randomness usually give different results on different executions for the same dataset. This non-deterministic nature of algorithms such as the K-Means clustering algorithm limits their applicability in areas such as cancer subtype prediction using gene expression data. It is hard to sensibly compare the results of such algorithms with those of other algorithms. The non-deterministic nature of K-Means is due to its random selection of data points as initial centroids. We propose an improved, density based version of K-Means, which involves a novel and systematic method for selecting initial centroids. The key idea of the algorithm is to select data points which belong to dense regions and which are adequately separated in feature space as the initial centroids. We compared the proposed algorithm to a set of eleven widely used single clustering algorithms and a prominent ensemble clustering algorithm which is being used for cancer data classification, based on the performances on a set of datasets comprising ten cancer gene expression datasets. The proposed algorithm has shown better overall performance than the others. There is a pressing need in the Biomedical domain for simple, easy-to-use and more accurate Machine Learning tools for cancer subtype prediction. The proposed algorithm is simple, easy-to-use and gives stable results. Moreover, it provides comparatively better predictions of cancer subtypes from gene expression data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

    Directory of Open Access Journals (Sweden)

    Simon Fong

    2014-01-01

    Full Text Available Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.

  16. Towards enhancement of performance of K-means clustering using nature-inspired optimization algorithms.

    Science.gov (United States)

    Fong, Simon; Deb, Suash; Yang, Xin-She; Zhuang, Yan

    2014-01-01

    Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario.

  17. Towards Enhancement of Performance of K-Means Clustering Using Nature-Inspired Optimization Algorithms

    Science.gov (United States)

    Deb, Suash; Yang, Xin-She

    2014-01-01

    Traditional K-means clustering algorithms have the drawback of getting stuck at local optima that depend on the random values of initial centroids. Optimization algorithms have their advantages in guiding iterative computation to search for global optima while avoiding local optima. The algorithms help speed up the clustering process by converging into a global optimum early with multiple search agents in action. Inspired by nature, some contemporary optimization algorithms which include Ant, Bat, Cuckoo, Firefly, and Wolf search algorithms mimic the swarming behavior allowing them to cooperatively steer towards an optimal objective within a reasonable time. It is known that these so-called nature-inspired optimization algorithms have their own characteristics as well as pros and cons in different applications. When these algorithms are combined with K-means clustering mechanism for the sake of enhancing its clustering quality by avoiding local optima and finding global optima, the new hybrids are anticipated to produce unprecedented performance. In this paper, we report the results of our evaluation experiments on the integration of nature-inspired optimization methods into K-means algorithms. In addition to the standard evaluation metrics in evaluating clustering quality, the extended K-means algorithms that are empowered by nature-inspired optimization methods are applied on image segmentation as a case study of application scenario. PMID:25202730

  18. K-Nearest Neighbor Intervals Based AP Clustering Algorithm for Large Incomplete Data

    Directory of Open Access Journals (Sweden)

    Cheng Lu

    2015-01-01

    Full Text Available The Affinity Propagation (AP algorithm is an effective algorithm for clustering analysis, but it can not be directly applicable to the case of incomplete data. In view of the prevalence of missing data and the uncertainty of missing attributes, we put forward a modified AP clustering algorithm based on K-nearest neighbor intervals (KNNI for incomplete data. Based on an Improved Partial Data Strategy, the proposed algorithm estimates the KNNI representation of missing attributes by using the attribute distribution information of the available data. The similarity function can be changed by dealing with the interval data. Then the improved AP algorithm can be applicable to the case of incomplete data. Experiments on several UCI datasets show that the proposed algorithm achieves impressive clustering results.

  19. Manual hierarchical clustering of regional geochemical data using a Bayesian finite mixture model

    International Nuclear Information System (INIS)

    Ellefsen, Karl J.; Smith, David B.

    2016-01-01

    Interpretation of regional scale, multivariate geochemical data is aided by a statistical technique called “clustering.” We investigate a particular clustering procedure by applying it to geochemical data collected in the State of Colorado, United States of America. The clustering procedure partitions the field samples for the entire survey area into two clusters. The field samples in each cluster are partitioned again to create two subclusters, and so on. This manual procedure generates a hierarchy of clusters, and the different levels of the hierarchy show geochemical and geological processes occurring at different spatial scales. Although there are many different clustering methods, we use Bayesian finite mixture modeling with two probability distributions, which yields two clusters. The model parameters are estimated with Hamiltonian Monte Carlo sampling of the posterior probability density function, which usually has multiple modes. Each mode has its own set of model parameters; each set is checked to ensure that it is consistent both with the data and with independent geologic knowledge. The set of model parameters that is most consistent with the independent geologic knowledge is selected for detailed interpretation and partitioning of the field samples. - Highlights: • We evaluate a clustering procedure by applying it to geochemical data. • The procedure generates a hierarchy of clusters. • Different levels of the hierarchy show geochemical processes at different spatial scales. • The clustering method is Bayesian finite mixture modeling. • Model parameters are estimated with Hamiltonian Monte Carlo sampling.

  20. Non-Hierarchical Clustering as a method to analyse an open-ended ...

    African Journals Online (AJOL)

    We show that the use of non-hierarchical analysis allows us to interpret the reasoning of students solving different mathematical problems using Algebra, and to separate them into different groups, that can be recognised and characterised by common traits in their answers, without any prior knowledge on the part of the ...

  1. A scalable and practical one-pass clustering algorithm for recommender system

    Science.gov (United States)

    Khalid, Asra; Ghazanfar, Mustansar Ali; Azam, Awais; Alahmari, Saad Ali

    2015-12-01

    KMeans clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple, fast, and accurate. We show empirically that the proposed algorithm outperforms K-Means in terms of recommendation and training time while maintaining a good level of accuracy.

  2. A Branch-and-Price algorithm for stable workforce assignments with hierarchical skills

    NARCIS (Netherlands)

    Firat, M.; Briskorn, D.; Laugier, A.

    2016-01-01

    This paper deals with assigning hierarchically skilled technicians to jobs by considering preferences. We investigate stability definitions in multi-skill workforce assignments stemming from the notion of blocking pairs as stated in the Marriage model of Gale–Shapley. We propose a Branch-and-Price

  3. Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions

    KAUST Repository

    AbdulJabbar, Mustafa Abdulmajeed; Markomanolis, George S.; Ibeid, Huda; Yokota, Rio; Keyes, David E.

    2017-01-01

    -synchronous collective communication of the local essential tree and an RDMA per task per cell. Finally, we take the dynamic sparse data exchange proposed by Hoefler et al. [1] and extend it to a hierarchical sparse data exchange, which is demonstrated at scale

  4. Prediction of the fate of Hg and other contaminants in soil around a former chlor-alkali plant using Fuzzy Hierarchical Cross-Clustering approach.

    Science.gov (United States)

    Frenţiu, Tiberiu; Ponta, Michaela; Sârbu, Costel

    2015-11-01

    An associative simultaneous fuzzy divisive hierarchical algorithm was used to predict the fate of Hg and other contaminants in soil around a former chlor-alkali plant. The algorithm was applied on several natural and anthropogenic characteristics of soil including water leachable, mobile, semi-mobile, non-mobile fractions and total Hg, Al, Ba, Ca, Cr, Cu, Fe, K, Li, Mg, Mn, Na, Sr, Zn, water leachable fraction of Cl(-), NO3(-) and SO4(2)(-), pH and total organic carbon. The cross-classification algorithm provided a divisive fuzzy partition of the soil samples and associated characteristics. Soils outside the perimeter of the former chlor-alkali plant were clustered based on the natural characteristics and total Hg. In contaminated zones Hg speciation becomes relevant and the assessment of species distribution is necessary. The descending order of concentration of Hg species in the test site was semi-mobile>mobile>non-mobile>water-leachable. Physico-chemical features responsible for similarities or differences between uncontaminated soil samples or contaminated with Hg, Cu, Zn, Ba and NO3(-) were also highlighted. Other characteristics of the contaminated soil were found to be Ca, sulfate, Na and chloride, some of which with influence on Hg fate. The presence of Ca and sulfate in soil induced a higher water leachability of Hg, while Cu had an opposite effect by forming amalgam. The used algorithm provided an in-deep understanding of processes involving Hg species and allowed to make prediction of the fate of Hg and contaminants linked to chlor-alkali-industry. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Investigating the effects of climate variations on bacillary dysentery incidence in northeast China using ridge regression and hierarchical cluster analysis

    Directory of Open Access Journals (Sweden)

    Guo Junqiao

    2008-09-01

    Full Text Available Abstract Background The effects of climate variations on bacillary dysentery incidence have gained more recent concern. However, the multi-collinearity among meteorological factors affects the accuracy of correlation with bacillary dysentery incidence. Methods As a remedy, a modified method to combine ridge regression and hierarchical cluster analysis was proposed for investigating the effects of climate variations on bacillary dysentery incidence in northeast China. Results All weather indicators, temperatures, precipitation, evaporation and relative humidity have shown positive correlation with the monthly incidence of bacillary dysentery, while air pressure had a negative correlation with the incidence. Ridge regression and hierarchical cluster analysis showed that during 1987–1996, relative humidity, temperatures and air pressure affected the transmission of the bacillary dysentery. During this period, all meteorological factors were divided into three categories. Relative humidity and precipitation belonged to one class, temperature indexes and evaporation belonged to another class, and air pressure was the third class. Conclusion Meteorological factors have affected the transmission of bacillary dysentery in northeast China. Bacillary dysentery prevention and control would benefit from by giving more consideration to local climate variations.

  6. Implementation of hierarchical clustering using k-mer sparse matrix to analyze MERS-CoV genetic relationship

    Science.gov (United States)

    Bustamam, A.; Ulul, E. D.; Hura, H. F. A.; Siswantining, T.

    2017-07-01

    Hierarchical clustering is one of effective methods in creating a phylogenetic tree based on the distance matrix between DNA (deoxyribonucleic acid) sequences. One of the well-known methods to calculate the distance matrix is k-mer method. Generally, k-mer is more efficient than some distance matrix calculation techniques. The steps of k-mer method are started from creating k-mer sparse matrix, and followed by creating k-mer singular value vectors. The last step is computing the distance amongst vectors. In this paper, we analyze the sequences of MERS-CoV (Middle East Respiratory Syndrome - Coronavirus) DNA by implementing hierarchical clustering using k-mer sparse matrix in order to perform the phylogenetic analysis. Our results show that the ancestor of our MERS-CoV is coming from Egypt. Moreover, we found that the MERS-CoV infection that occurs in one country may not necessarily come from the same country of origin. This suggests that the process of MERS-CoV mutation might not only be influenced by geographical factor.

  7. Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

    OpenAIRE

    Ali, Hesham Arafat; Attiya, Salah; El-henawy, Ibrahim

    2017-01-01

    In this paper we develop parallel cluster sampling algorithms and show that a multi-chain version is embarrassingly parallel and can be used efficiently for medical image retrieval among other applications.

  8. A novel artificial bee colony based clustering algorithm for categorical data.

    Science.gov (United States)

    Ji, Jinchao; Pang, Wei; Zheng, Yanlin; Wang, Zhe; Ma, Zhiqiang

    2015-01-01

    Data with categorical attributes are ubiquitous in the real world. However, existing partitional clustering algorithms for categorical data are prone to fall into local optima. To address this issue, in this paper we propose a novel clustering algorithm, ABC-K-Modes (Artificial Bee Colony clustering based on K-Modes), based on the traditional k-modes clustering algorithm and the artificial bee colony approach. In our approach, we first introduce a one-step k-modes procedure, and then integrate this procedure with the artificial bee colony approach to deal with categorical data. In the search process performed by scout bees, we adopt the multi-source search inspired by the idea of batch processing to accelerate the convergence of ABC-K-Modes. The performance of ABC-K-Modes is evaluated by a series of experiments in comparison with that of the other popular algorithms for categorical data.

  9. A Novel Automatic Detection System for ECG Arrhythmias Using Maximum Margin Clustering with Immune Evolutionary Algorithm

    Directory of Open Access Journals (Sweden)

    Bohui Zhu

    2013-01-01

    Full Text Available This paper presents a novel maximum margin clustering method with immune evolution (IEMMC for automatic diagnosis of electrocardiogram (ECG arrhythmias. This diagnostic system consists of signal processing, feature extraction, and the IEMMC algorithm for clustering of ECG arrhythmias. First, raw ECG signal is processed by an adaptive ECG filter based on wavelet transforms, and waveform of the ECG signal is detected; then, features are extracted from ECG signal to cluster different types of arrhythmias by the IEMMC algorithm. Three types of performance evaluation indicators are used to assess the effect of the IEMMC method for ECG arrhythmias, such as sensitivity, specificity, and accuracy. Compared with K-means and iterSVR algorithms, the IEMMC algorithm reflects better performance not only in clustering result but also in terms of global search ability and convergence ability, which proves its effectiveness for the detection of ECG arrhythmias.

  10. Kernel Clustering with a Differential Harmony Search Algorithm for Scheme Classification

    Directory of Open Access Journals (Sweden)

    Yu Feng

    2017-01-01

    Full Text Available This paper presents a kernel fuzzy clustering with a novel differential harmony search algorithm to coordinate with the diversion scheduling scheme classification. First, we employed a self-adaptive solution generation strategy and differential evolution-based population update strategy to improve the classical harmony search. Second, we applied the differential harmony search algorithm to the kernel fuzzy clustering to help the clustering method obtain better solutions. Finally, the combination of the kernel fuzzy clustering and the differential harmony search is applied for water diversion scheduling in East Lake. A comparison of the proposed method with other methods has been carried out. The results show that the kernel clustering with the differential harmony search algorithm has good performance to cooperate with the water diversion scheduling problems.

  11. An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt

    Directory of Open Access Journals (Sweden)

    Qingming Zhan

    2017-08-01

    Full Text Available An adaptive spatial clustering (ASC algorithm is proposed in this present study, which employs sweep-circle techniques and a dynamic threshold setting based on the Gestalt theory to detect spatial clusters. The proposed algorithm can automatically discover clusters in one pass, rather than through the modification of the initial model (for example, a minimal spanning tree, Delaunay triangulation, or Voronoi diagram. It can quickly identify arbitrarily-shaped clusters while adapting efficiently to non-homogeneous density characteristics of spatial data, without the need for prior knowledge or parameters. The proposed algorithm is also ideal for use in data streaming technology with dynamic characteristics flowing in the form of spatial clustering in large data sets.

  12. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  13. Ant colony algorithm for clustering in portfolio optimization

    Science.gov (United States)

    Subekti, R.; Sari, E. R.; Kusumawati, R.

    2018-03-01

    This research aims to describe portfolio optimization using clustering methods with ant colony approach. Two stock portfolios of LQ45 Indonesia is proposed based on the cluster results obtained from ant colony optimization (ACO). The first portfolio consists of assets with ant colony displacement opportunities beyond the defined probability limits of the researcher, where the weight of each asset is determined by mean-variance method. The second portfolio consists of two assets with the assumption that each asset is a cluster formed from ACO. The first portfolio has a better performance compared to the second portfolio seen from the Sharpe index.

  14. Clustered K nearest neighbor algorithm for daily inflow forecasting

    NARCIS (Netherlands)

    Akbari, M.; Van Overloop, P.J.A.T.M.; Afshar, A.

    2010-01-01

    Instance based learning (IBL) algorithms are a common choice among data driven algorithms for inflow forecasting. They are based on the similarity principle and prediction is made by the finite number of similar neighbors. In this sense, the similarity of a query instance is estimated according to

  15. A Coupled User Clustering Algorithm Based on Mixed Data for Web-Based Learning Systems

    Directory of Open Access Journals (Sweden)

    Ke Niu

    2015-01-01

    Full Text Available In traditional Web-based learning systems, due to insufficient learning behaviors analysis and personalized study guides, a few user clustering algorithms are introduced. While analyzing the behaviors with these algorithms, researchers generally focus on continuous data but easily neglect discrete data, each of which is generated from online learning actions. Moreover, there are implicit coupled interactions among the data but are frequently ignored in the introduced algorithms. Therefore, a mass of significant information which can positively affect clustering accuracy is neglected. To solve the above issues, we proposed a coupled user clustering algorithm for Wed-based learning systems by taking into account both discrete and continuous data, as well as intracoupled and intercoupled interactions of the data. The experiment result in this paper demonstrates the outperformance of the proposed algorithm.

  16. Non-Hierarchical Clustering as a method to analyse an open-ended ...

    African Journals Online (AJOL)

    Apple

    Keywords: algebraic thinking; cluster analysis; mathematics education; quantitative analysis. Introduction. Extensive ..... C1, C2 and C3 represent the three centroids of the three clusters formed. .... 6ALd. All these strategies are algebraic and 'high- ... 1995), of the didactical aspects related to teaching .... Brazil, 18-23 July.

  17. Clustering Algorithm As A Planning Support Tool For Rural Electrification Optimization

    Directory of Open Access Journals (Sweden)

    Ronaldo Pornillosa Parreno Jr

    2015-08-01

    Full Text Available Abstract In this study clustering algorithm was developed to optimize electrification plans by screening and grouping potential customers to be supplied with electricity. The algorithm provided adifferent approach in clustering problem which combines conceptual and distance-based clustering algorithmsto analyze potential clusters using spanning tree with the shortest possible edge weight and creating final cluster trees based on the test of inconsistency for the edges. The clustering criteria consists of commonly used distance measure with the addition of household information as basis for the ability to pay ATP value. The combination of these two parameters resulted to a more significant and realistic clusters since distance measure alone could not take the effect of the household characteristics in screening the most sensible groupings of households. In addition the implications of varying geographical features were incorporated in the algorithm by using routing index across the locations of the households. This new approach of connecting the households in an area was applied in an actual case study of one village or barangay that was not yet energized. The results of clustering algorithm generated cluster trees which could becomethetheoretical basis for power utilities to plan the initial network arrangement of electrification. Scenario analysis conducted on the two strategies of clustering the households provideddifferent alternatives for the optimization of the cost of electrification. Futhermorethe benefits associated with the two strategies formulated from the two scenarios was evaluated using benefit cost ratio BC to determine which is more economically advantageous. The results of the study showed that clustering algorithm proved to be effective in solving electrification optimization problem and serves its purpose as a planning support tool which can facilitate electrification in rural areas and achieve cost-effectiveness.

  18. Study on text mining algorithm for ultrasound examination of chronic liver diseases based on spectral clustering

    Science.gov (United States)

    Chang, Bingguo; Chen, Xiaofei

    2018-05-01

    Ultrasonography is an important examination for the diagnosis of chronic liver disease. The doctor gives the liver indicators and suggests the patient's condition according to the description of ultrasound report. With the rapid increase in the amount of data of ultrasound report, the workload of professional physician to manually distinguish ultrasound results significantly increases. In this paper, we use the spectral clustering method to cluster analysis of the description of the ultrasound report, and automatically generate the ultrasonic diagnostic diagnosis by machine learning. 110 groups ultrasound examination report of chronic liver disease were selected as test samples in this experiment, and the results were validated by spectral clustering and compared with k-means clustering algorithm. The results show that the accuracy of spectral clustering is 92.73%, which is higher than that of k-means clustering algorithm, which provides a powerful ultrasound-assisted diagnosis for patients with chronic liver disease.

  19. Unsupervised Cryo-EM Data Clustering through Adaptively Constrained K-Means Algorithm.

    Science.gov (United States)

    Xu, Yaofang; Wu, Jiayi; Yin, Chang-Cheng; Mao, Youdong

    2016-01-01

    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.

  20. Application of Fuzzy Algorithm in Optimizing Hierarchical Sliding Mode Control for Pendubot System

    Directory of Open Access Journals (Sweden)

    Xuan Dung Huynh

    2017-12-01

    Full Text Available Pendubot is a classical under-actuated SIMO model for control algorithm testing in laboratory of universities. In this paper, authors design a fuzzy-sliding control for this system. The controller is designed from a new idea of application of fuzzy algorithm for optioning control parameters. The response of system on TOP position under fuzzysliding control algorithm is proved to be better than under sliding controller through Matlab/Simulink simulation.

  1. An adaptive immune optimization algorithm with dynamic lattice searching operation for fast optimization of atomic clusters

    International Nuclear Information System (INIS)

    Wu, Xia; Wu, Genhua

    2014-01-01

    Highlights: • A high efficient method for optimization of atomic clusters is developed. • Its performance is studied by optimizing Lennard-Jones clusters and Ag clusters. • The method is proved to be quite efficient. • A new Ag 61 cluster with stacking-fault face-centered cubic motif is found. - Abstract: Geometrical optimization of atomic clusters is performed by a development of adaptive immune optimization algorithm (AIOA) with dynamic lattice searching (DLS) operation (AIOA-DLS method). By a cycle of construction and searching of the dynamic lattice (DL), DLS algorithm rapidly makes the clusters more regular and greatly reduces the potential energy. DLS can thus be used as an operation acting on the new individuals after mutation operation in AIOA to improve the performance of the AIOA. The AIOA-DLS method combines the merit of evolutionary algorithm and idea of dynamic lattice. The performance of the proposed method is investigated in the optimization of Lennard-Jones clusters within 250 atoms and silver clusters described by many-body Gupta potential within 150 atoms. Results reported in the literature are reproduced, and the motif of Ag 61 cluster is found to be stacking-fault face-centered cubic, whose energy is lower than that of previously obtained icosahedron

  2. Clinical assessment using an algorithm based on clustering Fuzzy c-means

    NARCIS (Netherlands)

    Guijarro-Rodriguez, A.; Cevallos-Torres, L.; Yepez-Holguin, J.; Botto-Tobar, M.; Valencia-García, R.; Lagos-Ortiz, K.; Alcaraz-Mármol, G.; Del Cioppo, J.; Vera-Lucio, N.; Bucaram-Leverone, M.

    2017-01-01

    The Fuzzy c-means (FCM) algorithms dene a grouping criterion from a function, which seeks to minimize iteratively the function up to an optimal fuzzy partition is obtained. In the execution of this algorithm relates each element to the clusters that were determined in the same n-dimensional space,

  3. A Framework for Evaluation and Exploration of Clustering Algorithms in Subspaces of High Dimensional Databases

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    comparative studies on the advantages and disadvantages of the different algorithms exist. Part of the underlying problem is the lack of available open source implementations that could be used by researchers to understand, compare, and extend subspace and projected clustering algorithms. In this work, we...

  4. Optimization Route of Food Logistics Distribution Based on Genetic and Graph Cluster Scheme Algorithm

    OpenAIRE

    Jing Chen

    2015-01-01

    This study takes the concept of food logistics distribution as the breakthrough point, by means of the aim of optimization of food logistics distribution routes and analysis of the optimization model of food logistics route, as well as the interpretation of the genetic algorithm, it discusses the optimization of food logistics distribution route based on genetic and cluster scheme algorithm.

  5. GenClust: A genetic algorithm for clustering gene expression data

    Directory of Open Access Journals (Sweden)

    Raimondi Alessandra

    2005-12-01

    Full Text Available Abstract Background Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a a novel coding of the search space that is simple, compact and easy to update; (b it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

  6. Hierarchical video summarization

    Science.gov (United States)

    Ratakonda, Krishna; Sezan, M. Ibrahim; Crinon, Regis J.

    1998-12-01

    We address the problem of key-frame summarization of vide in the absence of any a priori information about its content. This is a common problem that is encountered in home videos. We propose a hierarchical key-frame summarization algorithm where a coarse-to-fine key-frame summary is generated. A hierarchical key-frame summary facilitates multi-level browsing where the user can quickly discover the content of the video by accessing its coarsest but most compact summary and then view a desired segment of the video with increasingly more detail. At the finest level, the summary is generated on the basis of color features of video frames, using an extension of a recently proposed key-frame extraction algorithm. The finest level key-frames are recursively clustered using a novel pairwise K-means clustering approach with temporal consecutiveness constraint. We also address summarization of MPEG-2 compressed video without fully decoding the bitstream. We also propose efficient mechanisms that facilitate decoding the video when the hierarchical summary is utilized in browsing and playback of video segments starting at selected key-frames.

  7. Soil data clustering by using K-means and fuzzy K-means algorithm

    Directory of Open Access Journals (Sweden)

    E. Hot

    2016-06-01

    Full Text Available A problem of soil clustering based on the chemical characteristics of soil, and proper visual representation of the obtained results, is analysed in the paper. To that aim, K-means and fuzzy K-means algorithms are adapted for soil data clustering. A database of soil characteristics sampled in Montenegro is used for a comparative analysis of implemented algorithms. The procedure of setting proper values for control parameters of fuzzy K-means is illustrated on the used database. In addition, validation of clustering is made through visualisation. Classified soil data are presented on the static Google map and dynamic Open Street Map.

  8. Load balancing prediction method of cloud storage based on analytic hierarchy process and hybrid hierarchical genetic algorithm.

    Science.gov (United States)

    Zhou, Xiuze; Lin, Fan; Yang, Lvqing; Nie, Jing; Tan, Qian; Zeng, Wenhua; Zhang, Nian

    2016-01-01

    With the continuous expansion of the cloud computing platform scale and rapid growth of users and applications, how to efficiently use system resources to improve the overall performance of cloud computing has become a crucial issue. To address this issue, this paper proposes a method that uses an analytic hierarchy process group decision (AHPGD) to evaluate the load state of server nodes. Training was carried out by using a hybrid hierarchical genetic algorithm (HHGA) for optimizing a radial basis function neural network (RBFNN). The AHPGD makes the aggregative indicator of virtual machines in cloud, and become input parameters of predicted RBFNN. Also, this paper proposes a new dynamic load balancing scheduling algorithm combined with a weighted round-robin algorithm, which uses the predictive periodical load value of nodes based on AHPPGD and RBFNN optimized by HHGA, then calculates the corresponding weight values of nodes and makes constant updates. Meanwhile, it keeps the advantages and avoids the shortcomings of static weighted round-robin algorithm.

  9. PERFORMANCE ANALYSIS OF SET PARTITIONING IN HIERARCHICAL TREES (SPIHT ALGORITHM FOR A FAMILY OF WAVELETS USED IN COLOR IMAGE COMPRESSION

    Directory of Open Access Journals (Sweden)

    A. Sreenivasa Murthy

    2014-11-01

    Full Text Available With the spurt in the amount of data (Image, video, audio, speech, & text available on the net, there is a huge demand for memory & bandwidth savings. One has to achieve this, by maintaining the quality & fidelity of the data acceptable to the end user. Wavelet transform is an important and practical tool for data compression. Set partitioning in hierarchal trees (SPIHT is a widely used compression algorithm for wavelet transformed images. Among all wavelet transform and zero-tree quantization based image compression algorithms SPIHT has become the benchmark state-of-the-art algorithm because it is simple to implement & yields good results. In this paper we present a comparative study of various wavelet families for image compression with SPIHT algorithm. We have conducted experiments with Daubechies, Coiflet, Symlet, Bi-orthogonal, Reverse Bi-orthogonal and Demeyer wavelet types. The resulting image quality is measured objectively, using peak signal-to-noise ratio (PSNR, and subjectively, using perceived image quality (human visual perception, HVP for short. The resulting reduction in the image size is quantified by compression ratio (CR.

  10. Stress reaction process-based hierarchical recognition algorithm for continuous intrusion events in optical fiber prewarning system

    Science.gov (United States)

    Qu, Hongquan; Yuan, Shijiao; Wang, Yanping; Yang, Dan

    2018-04-01

    To improve the recognition performance of optical fiber prewarning system (OFPS), this study proposed a hierarchical recognition algorithm (HRA). Compared with traditional methods, which employ only a complex algorithm that includes multiple extracted features and complex classifiers to increase the recognition rate with a considerable decrease in recognition speed, HRA takes advantage of the continuity of intrusion events, thereby creating a staged recognition flow inspired by stress reaction. HRA is expected to achieve high-level recognition accuracy with less time consumption. First, this work analyzed the continuity of intrusion events and then presented the algorithm based on the mechanism of stress reaction. Finally, it verified the time consumption through theoretical analysis and experiments, and the recognition accuracy was obtained through experiments. Experiment results show that the processing speed of HRA is 3.3 times faster than that of a traditional complicated algorithm and has a similar recognition rate of 98%. The study is of great significance to fast intrusion event recognition in OFPS.

  11. Improved Density Based Spatial Clustering of Applications of Noise Clustering Algorithm for Knowledge Discovery in Spatial Data

    Directory of Open Access Journals (Sweden)

    Arvind Sharma

    2016-01-01

    Full Text Available There are many techniques available in the field of data mining and its subfield spatial data mining is to understand relationships between data objects. Data objects related with spatial features are called spatial databases. These relationships can be used for prediction and trend detection between spatial and nonspatial objects for social and scientific reasons. A huge data set may be collected from different sources as satellite images, X-rays, medical images, traffic cameras, and GIS system. To handle this large amount of data and set relationship between them in a certain manner with certain results is our primary purpose of this paper. This paper gives a complete process to understand how spatial data is different from other kinds of data sets and how it is refined to apply to get useful results and set trends to predict geographic information system and spatial data mining process. In this paper a new improved algorithm for clustering is designed because role of clustering is very indispensable in spatial data mining process. Clustering methods are useful in various fields of human life such as GIS (Geographic Information System, GPS (Global Positioning System, weather forecasting, air traffic controller, water treatment, area selection, cost estimation, planning of rural and urban areas, remote sensing, and VLSI designing. This paper presents study of various clustering methods and algorithms and an improved algorithm of DBSCAN as IDBSCAN (Improved Density Based Spatial Clustering of Application of Noise. The algorithm is designed by addition of some important attributes which are responsible for generation of better clusters from existing data sets in comparison of other methods.

  12. Reconstruction of a digital core containing clay minerals based on a clustering algorithm

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K -means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  13. Reconstruction of a digital core containing clay minerals based on a clustering algorithm.

    Science.gov (United States)

    He, Yanlong; Pu, Chunsheng; Jing, Cheng; Gu, Xiaoyu; Chen, Qingdong; Liu, Hongzhi; Khan, Nasir; Dong, Qiaoling

    2017-10-01

    It is difficult to obtain a core sample and information for digital core reconstruction of mature sandstone reservoirs around the world, especially for an unconsolidated sandstone reservoir. Meanwhile, reconstruction and division of clay minerals play a vital role in the reconstruction of the digital cores, although the two-dimensional data-based reconstruction methods are specifically applicable as the microstructure reservoir simulation methods for the sandstone reservoir. However, reconstruction of clay minerals is still challenging from a research viewpoint for the better reconstruction of various clay minerals in the digital cores. In the present work, the content of clay minerals was considered on the basis of two-dimensional information about the reservoir. After application of the hybrid method, and compared with the model reconstructed by the process-based method, the digital core containing clay clusters without the labels of the clusters' number, size, and texture were the output. The statistics and geometry of the reconstruction model were similar to the reference model. In addition, the Hoshen-Kopelman algorithm was used to label various connected unclassified clay clusters in the initial model and then the number and size of clay clusters were recorded. At the same time, the K-means clustering algorithm was applied to divide the labeled, large connecting clusters into smaller clusters on the basis of difference in the clusters' characteristics. According to the clay minerals' characteristics, such as types, textures, and distributions, the digital core containing clay minerals was reconstructed by means of the clustering algorithm and the clay clusters' structure judgment. The distributions and textures of the clay minerals of the digital core were reasonable. The clustering algorithm improved the digital core reconstruction and provided an alternative method for the simulation of different clay minerals in the digital cores.

  14. Graph-based clustering and data visualization algorithms

    CERN Document Server

    Vathy-Fogarassy, Ágnes

    2013-01-01

    This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on

  15. An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network.

    Science.gov (United States)

    Vimalarani, C; Subramanian, R; Sivanandam, S N

    2016-01-01

    Wireless Sensor Network (WSN) is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO) algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO) algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.

  16. An Enhanced PSO-Based Clustering Energy Optimization Algorithm for Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    C. Vimalarani

    2016-01-01

    Full Text Available Wireless Sensor Network (WSN is a network which formed with a maximum number of sensor nodes which are positioned in an application environment to monitor the physical entities in a target area, for example, temperature monitoring environment, water level, monitoring pressure, and health care, and various military applications. Mostly sensor nodes are equipped with self-supported battery power through which they can perform adequate operations and communication among neighboring nodes. Maximizing the lifetime of the Wireless Sensor networks, energy conservation measures are essential for improving the performance of WSNs. This paper proposes an Enhanced PSO-Based Clustering Energy Optimization (EPSO-CEO algorithm for Wireless Sensor Network in which clustering and clustering head selection are done by using Particle Swarm Optimization (PSO algorithm with respect to minimizing the power consumption in WSN. The performance metrics are evaluated and results are compared with competitive clustering algorithm to validate the reduction in energy consumption.

  17. A new collaborative recommendation approach based on users clustering using artificial bee colony algorithm.

    Science.gov (United States)

    Ju, Chunhua; Xu, Chonghuan

    2013-01-01

    Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users' preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC) algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.

  18. A New Collaborative Recommendation Approach Based on Users Clustering Using Artificial Bee Colony Algorithm

    Directory of Open Access Journals (Sweden)

    Chunhua Ju

    2013-01-01

    Full Text Available Although there are many good collaborative recommendation methods, it is still a challenge to increase the accuracy and diversity of these methods to fulfill users’ preferences. In this paper, we propose a novel collaborative filtering recommendation approach based on K-means clustering algorithm. In the process of clustering, we use artificial bee colony (ABC algorithm to overcome the local optimal problem caused by K-means. After that we adopt the modified cosine similarity to compute the similarity between users in the same clusters. Finally, we generate recommendation results for the corresponding target users. Detailed numerical analysis on a benchmark dataset MovieLens and a real-world dataset indicates that our new collaborative filtering approach based on users clustering algorithm outperforms many other recommendation methods.

  19. Data Clustering on Breast Cancer Data Using Firefly Algorithm with Golden Ratio Method

    Directory of Open Access Journals (Sweden)

    DEMIR, M.

    2015-05-01

    Full Text Available Heuristic methods are problem solving methods. In general, they obtain near-optimal solutions, and they do not take the care of provability of this case. The heuristic methods do not guarantee to obtain the optimal results; however, they guarantee to obtain near-optimal solutions in considerable time. In this paper, an application was performed by using firefly algorithm - one of the heuristic methods. The golden ratio was applied to different steps of firefly algorithm and different parameters of firefly algorithm to develop a new algorithm - called Firefly Algorithm with Golden Ratio (FAGR. It was shown that the golden ratio made firefly algorithm be superior to the firefly algorithm without golden ratio. At this aim, the developed algorithm was applied to WBCD database (breast cancer database to cluster data obtained from breast cancer patients. The highest obtained success rate among all executions is 96% and the highest obtained average success rate in all executions is 94.5%.

  20. Fuzzy Weight Cluster-Based Routing Algorithm for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Teng Gao

    2015-01-01

    Full Text Available Cluster-based protocol is a kind of important routing in wireless sensor networks. However, due to the uneven distribution of cluster heads in classical clustering algorithm, some nodes may run out of energy too early, which is not suitable for large-scale wireless sensor networks. In this paper, a distributed clustering algorithm based on fuzzy weighted attributes is put forward to ensure both energy efficiency and extensibility. On the premise of a comprehensive consideration of all attributes, the corresponding weight of each parameter is assigned by using the direct method of fuzzy engineering theory. Then, each node works out property value. These property values will be mapped to the time axis and be triggered by a timer to broadcast cluster headers. At the same time, the radio coverage method is adopted, in order to avoid collisions and to ensure the symmetrical distribution of cluster heads. The aggregated data are forwarded to the sink node in the form of multihop. The simulation results demonstrate that clustering algorithm based on fuzzy weighted attributes has a longer life expectancy and better extensibility than LEACH-like algorithms.

  1. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    Science.gov (United States)

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  2. A Fast General-Purpose Clustering Algorithm Based on FPGAs for High-Throughput Data Processing

    CERN Document Server

    Annovi, A; The ATLAS collaboration; Castegnaro, A; Gatta, M

    2012-01-01

    We present a fast general-purpose algorithm for high-throughput clustering of data ”with a two dimensional organization”. The algorithm is designed to be implemented with FPGAs or custom electronics. The key feature is a processing time that scales linearly with the amount of data to be processed. This means that clustering can be performed in pipeline with the readout, without suffering from combinatorial delays due to looping multiple times through all the data. This feature makes this algorithm especially well suited for problems where the data has high density, e.g. in the case of tracking devices working under high-luminosity condition such as those of LHC or Super-LHC. The algorithm is organized in two steps: the first step (core) clusters the data; the second step analyzes each cluster of data to extract the desired information. The current algorithm is developed as a clustering device for modern high-energy physics pixel detectors. However, the algorithm has much broader field of applications. In ...

  3. Genome-wide decoding of hierarchical modular structure of transcriptional regulation by cis-element and expression clustering.

    Science.gov (United States)

    Leyfer, Dmitriy; Weng, Zhiping

    2005-09-01

    A holistic approach to the study of cellular processes is identifying both gene-expression changes and regulatory elements promoting such changes. Cellular regulatory processes can be viewed as transcriptional modules (TMs), groups of coexpressed genes regulated by groups of transcription factors (TFs). We set out to devise a method that would identify TMs while avoiding arbitrary thresholds on TM sizes and number. Assuming that gene expression is determined by TFs that bind to the gene's promoter, clustering of genes based on TF binding sites (cis-elements) should create gene groups similar to those obtained by gene expression clustering. Intersections between the expression and cis-element-based gene clusters reveal TMs. Statistical significance assigned to each TM allows identification of regulatory units of any size. Our method correctly identifies the number and sizes of TMs on simulated datasets. We demonstrate that yeast experimental TMs are biologically relevant by comparing them with MIPS and GO categories. Our modules are in statistically significant agreement with TMs from other research groups. This work suggests that there is no preferential division of biological processes into regulatory units; each degree of partitioning exhibits a slice of biological network revealing hierarchical modular organization of transcriptional regulation.

  4. Comparison of Clustering Algorithms for the Identification of Topics on Twitter

    Directory of Open Access Journals (Sweden)

    Marjori N. M. Klinczak

    2016-05-01

    Full Text Available Topic Identification in Social Networks has become an important task when dealing with event detection, particularly when global communities are affected. In order to attack this problem, text processing techniques and machine learning algorithms have been extensively used. In this paper we compare four clustering algorithms – k-means, k-medoids, DBSCAN and NMF (Non-negative Matrix Factorization – in order to detect topics related to textual messages obtained from Twitter. The algorithms were applied to a database initially composed by tweets having hashtags related to the recent Nepal earthquake as initial context. Obtained results suggest that the NMF clustering algorithm presents superior results, providing simpler clusters that are also easier to interpret.

  5. Online cluster-finding algorithms for the PANDA electromagnetic calorimeter

    NARCIS (Netherlands)

    Tiemens, Marcel

    2017-01-01

    Om zeldzame processen zoals de vorming van exotische deeltjes te kunnen bestuderen, is het PANDA experiment opgezet. Om de grote hoeveelheden data te kunnen verwerken, verwerken de subsystemen de data voor. Een voorbeeld is het algoritme om online naar clusters te zoeken in de data van de

  6. A Class of Manifold Regularized Multiplicative Update Algorithms for Image Clustering.

    Science.gov (United States)

    Yang, Shangming; Yi, Zhang; He, Xiaofei; Li, Xuelong

    2015-12-01

    Multiplicative update algorithms are important tools for information retrieval, image processing, and pattern recognition. However, when the graph regularization is added to the cost function, different classes of sample data may be mapped to the same subspace, which leads to the increase of data clustering error rate. In this paper, an improved nonnegative matrix factorization (NMF) cost function is introduced. Based on the cost function, a class of novel graph regularized NMF algorithms is developed, which results in a class of extended multiplicative update algorithms with manifold structure regularization. Analysis shows that in the learning, the proposed algorithms can efficiently minimize the rank of the data representation matrix. Theoretical results presented in this paper are confirmed by simulations. For different initializations and data sets, variation curves of cost functions and decomposition data are presented to show the convergence features of the proposed update rules. Basis images, reconstructed images, and clustering results are utilized to present the efficiency of the new algorithms. Last, the clustering accuracies of different algorithms are also investigated, which shows that the proposed algorithms can achieve state-of-the-art performance in applications of image clustering.

  7. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Science.gov (United States)

    Ying Wah, Teh

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753

  8. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  9. MINING ON CAR DATABASE EMPLOYING LEARNING AND CLUSTERING ALGORITHMS

    OpenAIRE

    Muhammad Rukunuddin Ghalib; Shivam Vohra; Sunish Vohra; Akash Juneja

    2013-01-01

    In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the known learning algorithms used are Naïve Bayesian (NB) and SMO (Self-Minimal-Optimisation) .Thus the following two learning algorithms are used on a Car review database and thus a model is hence created which predicts the characteristic of a review comment after getting trained. It was found that model successfully predicted correctly about the review comm...

  10. SHARPEN-Systematic Hierarchical Algorithms for Rotamers and Proteins on an Extended Network

    KAUST Repository

    Loksha, Ilya V.

    2009-04-30

    Algorithms for discrete optimization of proteins play a central role in recent advances in protein structure prediction and design. We wish to improve the resources available for computational biologists to rapidly prototype such algorithms and to easily scale these algorithms to many processors. To that end, we describe the implementation and use of two new open source resources, citing potential benefits over existing software. We discuss CHOMP, a new object-oriented library for macromolecular optimization, and SHARPEN, a framework for scaling CHOMP scripts to many computers. These tools allow users to develop new algorithms for a variety of applications including protein repacking, protein-protein docking, loop rebuilding, or homology model remediation. Particular care was taken to allow modular energy function design; protein conformations may currently be scored using either the OPLSaa molecular mechanical energy function or an all-atom semiempirical energy function employed by Rosetta. © 2009 Wiley Periodicals, Inc.

  11. NOVEL CONTEXT-AWARE CLUSTERING WITH HIERARCHICAL ADDRESSING (CCHA) FOR THE INTERNET OF THINGS (IoT)

    DEFF Research Database (Denmark)

    Mahalle, Parikshit N.; Prasad, Neeli R.; Prasad, Ramjee

    2013-01-01

    As computing technology becomes more tightly coupled into dynamic and mobile world of the Internet of Things (IoT), security mechanism becomes more stringent, less flexible and intrusive. Scalability issue in the IoT makes Identity Management (IdM) of ubiquitous things more challenging. Forming ad......-hoc network, interaction between these nomadic devices to provide seamless service extend the need of new identi-ties to the things, addressing and IdM in the IoT. New identities and identifier format to alleviate the perfor-mance issue is introduced in this paper. This paper pre-sents novel Context......-aware Clustering with Hierarchical Addressing (CCHA) scheme for the things with new identifier format. Simulation results shows that CCHA achieves better performance with less energy expendi-ture, less end-to-end delay and more throughput. Results also show that CCHA significantly reduces the failure probability...

  12. A Heuristic Task Scheduling Algorithm for Heterogeneous Virtual Clusters

    OpenAIRE

    Weiwei Lin; Wentai Wu; James Z. Wang

    2016-01-01

    Cloud computing provides on-demand computing and storage services with high performance and high scalability. However, the rising energy consumption of cloud data centers has become a prominent problem. In this paper, we first introduce an energy-aware framework for task scheduling in virtual clusters. The framework consists of a task resource requirements prediction module, an energy estimate module, and a scheduler with a task buffer. Secondly, based on this framework, we propose a virtual ...

  13. A Novel Hierarchical Model to Locate Health Care Facilities with Fuzzy Demand Solved by Harmony Search Algorithm

    Directory of Open Access Journals (Sweden)

    Mehdi Alinaghian

    2014-08-01

    Full Text Available In the field of health losses resulting from failure to establish the facilities in a suitable location and the required number, beyond the cost and quality of service will result in an increase in mortality and the spread of diseases. So the facility location models have special importance in this area. In this paper, a successively inclusive hierarchical model for location of health centers in term of the transfer of patients from a lower level to a higher level of health centers has been developed. Since determination the exact number of demand for health care in the future is difficult and in order to make the model close to the real conditions of demand uncertainty, a fuzzy programming model based on credibility theory is considered. To evaluate the proposed model, several numerical examples are solved in small size. In order to solve large scale problems, a meta-heuristic algorithm based on harmony search algorithm was developed in conjunction with the GAMS software which indicants the performance of the proposed algorithm.

  14. Kendall’s tau and agglomerative clustering for structure determination of hierarchical Archimedean copulas

    Czech Academy of Sciences Publication Activity Database

    Górecki, J.; Hofert, M.; Holeňa, Martin

    2017-01-01

    Roč. 5, č. 1 (2017), s. 75-87 ISSN 2300-2298 R&D Projects: GA ČR GA17-01251S Institutional support: RVO:67985807 Keywords : structure determination * agglomerative clustering * Kendall’s tau * Archimedean copula Subject RIV: IN - Informatics, Computer Science OBOR OECD: Statistics and probability

  15. Clustering Batik Images using Fuzzy C-Means Algorithm Based on Log-Average Luminance

    Directory of Open Access Journals (Sweden)

    Ahmad Sanmorino

    2012-06-01

    Full Text Available Batik is a fabric or clothes that are made ​​with a special staining technique called wax-resist dyeing and is one of the cultural heritage which has high artistic value. In order to improve the efficiency and give better semantic to the image, some researchers apply clustering algorithm for managing images before they can be retrieved. Image clustering is a process of grouping images based on their similarity. In this paper we attempt to provide an alternative method of grouping batik image using fuzzy c-means (FCM algorithm based on log-average luminance of the batik. FCM clustering algorithm is an algorithm that works using fuzzy models that allow all data from all cluster members are formed with different degrees of membership between 0 and 1. Log-average luminance (LAL is the average value of the lighting in an image. We can compare different image lighting from one image to another using LAL. From the experiments that have been made, it can be concluded that fuzzy c-means algorithm can be used for batik image clustering based on log-average luminance of each image possessed.

  16. CAMPAIGN: an open-source library of GPU-accelerated data clustering algorithms.

    Science.gov (United States)

    Kohlhoff, Kai J; Sosnick, Marc H; Hsu, William T; Pande, Vijay S; Altman, Russ B

    2011-08-15

    Data clustering techniques are an essential component of a good data analysis toolbox. Many current bioinformatics applications are inherently compute-intense and work with very large datasets. Sequential algorithms are inadequate for providing the necessary performance. For this reason, we have created Clustering Algorithms for Massively Parallel Architectures, Including GPU Nodes (CAMPAIGN), a central resource for data clustering algorithms and tools that are implemented specifically for execution on massively parallel processing architectures. CAMPAIGN is a library of data clustering algorithms and tools, written in 'C for CUDA' for Nvidia GPUs. The library provides up to two orders of magnitude speed-up over respective CPU-based clustering algorithms and is intended as an open-source resource. New modules from the community will be accepted into the library and the layout of it is such that it can easily be extended to promising future platforms such as OpenCL. Releases of the CAMPAIGN library are freely available for download under the LGPL from https://simtk.org/home/campaign. Source code can also be obtained through anonymous subversion access as described on https://simtk.org/scm/?group_id=453. kjk33@cantab.net.

  17. Modeling and Sensitivity Study of Consensus Algorithm-Based Distributed Hierarchical Control for DC Microgrids

    DEFF Research Database (Denmark)

    Meng, Lexuan; Dragicevic, Tomislav; Roldan Perez, Javier

    2016-01-01

    Distributed control methods based on consensus algorithms have become popular in recent years for microgrid (MG) systems. These kinds of algorithms can be applied to share information in order to coordinate multiple distributed generators within a MG. However, stability analysis becomes a challen......Distributed control methods based on consensus algorithms have become popular in recent years for microgrid (MG) systems. These kinds of algorithms can be applied to share information in order to coordinate multiple distributed generators within a MG. However, stability analysis becomes...... in the communication network, continuous-time methods can be inaccurate for this kind of dynamic study. Therefore, this paper aims at modeling a complete DC MG using a discrete-time approach in order to perform a sensitivity analysis taking into account the effects of the consensus algorithm. To this end......, a generalized modeling method is proposed and the influence of key control parameters, the communication topology and the communication speed are studied in detail. The theoretical results obtained with the proposed model are verified by comparing them with the results obtained with a detailed switching...

  18. A rapid ATR-FTIR spectroscopic method for detection of sibutramine adulteration in tea and coffee based on hierarchical cluster and principal component analyses.

    Science.gov (United States)

    Cebi, Nur; Yilmaz, Mustafa Tahsin; Sagdic, Osman

    2017-08-15

    Sibutramine may be illicitly included in herbal slimming foods and supplements marketed as "100% natural" to enhance weight loss. Considering public health and legal regulations, there is an urgent need for effective, rapid and reliable techniques to detect sibutramine in dietetic herbal foods, teas and dietary supplements. This research comprehensively explored, for the first time, detection of sibutramine in green tea, green coffee and mixed herbal tea using ATR-FTIR spectroscopic technique combined with chemometrics. Hierarchical cluster analysis and PCA principle component analysis techniques were employed in spectral range (2746-2656cm -1 ) for classification and discrimination through Euclidian distance and Ward's algorithm. Unadulterated and adulterated samples were classified and discriminated with respect to their sibutramine contents with perfect accuracy without any false prediction. The results suggest that existence of the active substance could be successfully determined at the levels in the range of 0.375-12mg in totally 1.75g of green tea, green coffee and mixed herbal tea by using FTIR-ATR technique combined with chemometrics. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Channel Parameter Estimation for Scatter Cluster Model Using Modified MUSIC Algorithm

    Directory of Open Access Journals (Sweden)

    Jinsheng Yang

    2012-01-01

    Full Text Available Recently, the scatter cluster models which precisely evaluate the performance of the wireless communication system have been proposed in the literature. However, the conventional SAGE algorithm does not work for these scatter cluster-based models because it performs poorly when the transmit signals are highly correlated. In this paper, we estimate the time of arrival (TOA, the direction of arrival (DOA, and Doppler frequency for scatter cluster model by the modified multiple signal classification (MUSIC algorithm. Using the space-time characteristics of the multiray channel, the proposed algorithm combines the temporal filtering techniques and the spatial smoothing techniques to isolate and estimate the incoming rays. The simulation results indicated that the proposed algorithm has lower complexity and is less time-consuming in the dense multipath environment than SAGE algorithm. Furthermore, the estimations’ performance increases with elements of receive array and samples length. Thus, the problem of the channel parameter estimation of the scatter cluster model can be effectively addressed with the proposed modified MUSIC algorithm.

  20. Proposed Fuzzy-NN Algorithm with LoRaCommunication Protocol for Clustered Irrigation Systems

    Directory of Open Access Journals (Sweden)

    Sotirios Kontogiannis

    2017-11-01

    Full Text Available Modern irrigation systems utilize sensors and actuators, interconnected together as a single entity. In such entities, A.I. algorithms are implemented, which are responsible for the irrigation process. In this paper, the authors present an irrigation Open Watering System (OWS architecture that spatially clusters the irrigation process into autonomous irrigation sections. Authors’ OWS implementation includes a Neuro-Fuzzy decision algorithm called FITRA, which originates from the Greek word for seed. In this paper, the FITRA algorithm is described in detail, as are experimentation results that indicate significant water conservations from the use of the FITRA algorithm. Furthermore, the authors propose a new communication protocol over LoRa radio as an alternative low-energy and long-range OWS clusters communication mechanism. The experimental scenarios confirm that the FITRA algorithm provides more efficient irrigation on clustered areas than existing non-clustered, time scheduled or threshold adaptive algorithms. This is due to the FITRA algorithm’s frequent monitoring of environmental conditions, fuzzy and neural network adaptation as well as adherence to past irrigation preferences.

  1. Faster implementation of the hierarchical search algorithm for detection of gravitational waves from inspiraling compact binaries

    International Nuclear Information System (INIS)

    Sengupta, Anand S.; Dhurandhar, Sanjeev; Lazzarini, Albert

    2003-01-01

    The first scientific runs of kilometer scale laser interferometric detectors such as LIGO are under way. Data from these detectors will be used to look for signatures of gravitational waves from astrophysical objects such as inspiraling neutron-star-black-hole binaries using matched filtering. The computational resources required for online flat-search implementation of the matched filtering are large if searches are carried out for a small total mass. A flat search is implemented by constructing a single discrete grid of densely populated template waveforms spanning the dynamical parameters--masses, spins--which are correlated with the interferometer data. The correlations over the kinematical parameters can be maximized a priori without constructing a template bank over them. Mohanty and Dhurandhar showed that a significant reduction in computational resources can be accomplished by using a hierarchy of such template banks where candidate events triggered by a sparsely populated grid are followed up by the regular, dense flat-search grid. The estimated speedup in this method was a factor ∼25 over the flat search. In this paper we report an improved implementation of the hierarchical search, wherein we extend the domain of hierarchy to an extra dimension--namely, the time of arrival of the signal in the bandwidth of the interferometer. This is accomplished by lowering the Nyquist sampling rate of the signal in the trigger stage. We show that this leads to further improvement in the efficiency of data analysis and speeds up the online computation by a factor of ∼65-70 over the flat search. We also take into account and discuss issues related to template placement, trigger thresholds, and other peculiar problems that do not arise in earlier implementation schemes of the hierarchical search. We present simulation results for 2PN waveforms embedded in the noise expected for initial LIGO detectors

  2. A Cluster-Based Fuzzy Fusion Algorithm for Event Detection in Heterogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    ZiQi Hao

    2015-01-01

    Full Text Available As limited energy is one of the tough challenges in wireless sensor networks (WSN, energy saving becomes important in increasing the lifecycle of the network. Data fusion enables combining information from several sources thus to provide a unified scenario, which can significantly save sensor energy and enhance sensing data accuracy. In this paper, we propose a cluster-based data fusion algorithm for event detection. We use k-means algorithm to form the nodes into clusters, which can significantly reduce the energy consumption of intracluster communication. Distances between cluster heads and event and energy of clusters are fuzzified, thus to use a fuzzy logic to select the clusters that will participate in data uploading and fusion. Fuzzy logic method is also used by cluster heads for local decision, and then the local decision results are sent to the base station. Decision-level fusion for final decision of event is performed by base station according to the uploaded local decisions and fusion support degree of clusters calculated by fuzzy logic method. The effectiveness of this algorithm is demonstrated by simulation results.

  3. A comparison of hierarchical cluster analysis and league table rankings as methods for analysis and presentation of district health system performance data in Uganda.

    Science.gov (United States)

    Tashobya, Christine K; Dubourg, Dominique; Ssengooba, Freddie; Speybroeck, Niko; Macq, Jean; Criel, Bart

    2016-03-01

    In 2003, the Uganda Ministry of Health introduced the district league table for district health system performance assessment. The league table presents district performance against a number of input, process and output indicators and a composite index to rank districts. This study explores the use of hierarchical cluster analysis for analysing and presenting district health systems performance data and compares this approach with the use of the league table in Uganda. Ministry of Health and district plans and reports, and published documents were used to provide information on the development and utilization of the Uganda district league table. Quantitative data were accessed from the Ministry of Health databases. Statistical analysis using SPSS version 20 and hierarchical cluster analysis, utilizing Wards' method was used. The hierarchical cluster analysis was conducted on the basis of seven clusters determined for each year from 2003 to 2010, ranging from a cluster of good through moderate-to-poor performers. The characteristics and membership of clusters varied from year to year and were determined by the identity and magnitude of performance of the individual variables. Criticisms of the league table include: perceived unfairness, as it did not take into consideration district peculiarities; and being oversummarized and not adequately informative. Clustering organizes the many data points into clusters of similar entities according to an agreed set of indicators and can provide the beginning point for identifying factors behind the observed performance of districts. Although league table ranking emphasize summation and external control, clustering has the potential to encourage a formative, learning approach. More research is required to shed more light on factors behind observed performance of the different clusters. Other countries especially low-income countries that share many similarities with Uganda can learn from these experiences. © The Author 2015

  4. Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

    Science.gov (United States)

    Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

    2011-06-01

    Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for

  5. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR is an efficient tool for metamodelling of nonlinear dynamic models

    Directory of Open Access Journals (Sweden)

    Omholt Stig W

    2011-06-01

    Full Text Available Abstract Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs to variation in features of the trajectories of the state variables (outputs throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR, where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR and ordinary least squares (OLS regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback

  6. Hierarchical and Complex System Entropy Clustering Analysis Based Validation for Traditional Chinese Medicine Syndrome Patterns of Chronic Atrophic Gastritis.

    Science.gov (United States)

    Zhang, Yin; Liu, Yue; Li, Yannan; Zhao, Xia; Zhuo, Lin; Zhou, Ajian; Zhang, Li; Su, Zeqi; Chen, Cen; Du, Shiyu; Liu, Daming; Ding, Xia

    2018-03-22

    Chronic atrophic gastritis (CAG) is the precancerous stage of gastric carcinoma. Traditional Chinese Medicine (TCM) has been widely used in treating CAG. This study aimed to reveal core pathogenesis of CAG by validating the TCM syndrome patterns and provide evidence for optimization of treatment strategies. This is a cross-sectional study conducted in 4 hospitals in China. Hierarchical clustering analysis (HCA) and complex system entropy clustering analysis (CSECA) were performed, respectively, to achieve syndrome pattern validation. Based on HCA, 15 common factors were assigned to 6 syndrome patterns: liver depression and spleen deficiency and blood stasis in the stomach collateral, internal harassment of phlegm-heat and blood stasis in the stomach collateral, phlegm-turbidity internal obstruction, spleen yang deficiency, internal harassment of phlegm-heat and spleen deficiency, and spleen qi deficiency. By CSECA, 22 common factors were assigned to 7 syndrome patterns: qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency. Combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, yang deficiency, and yin deficiency may play a crucial role in CAG pathogenesis. In accord with this, treatment strategies by TCM herbal prescriptions should be targeted to regulating qi, activating blood, resolving turbidity, clearing heat, removing toxin, nourishing yin, and warming yang. Further explorations are needed to verify and expand the current conclusions.

  7. Efficient visible light photocatalytic NO{sub x} removal with cationic Ag clusters-grafted (BiO){sub 2}CO{sub 3} hierarchical superstructures

    Energy Technology Data Exchange (ETDEWEB)

    Feng, Xin [Chongqing Key Laboratory of Catalysis and Functional Organic Molecules, College of Environment and Resources, Engineering Research Center for Waste Oil Recovery Technology and Equipment of Ministry of Education, College of Environment and Resources, Chongqing Technology and Business University, Chongqing 40067 (China); Zhang, Wendong [Department of Scientific Research Management, Chongqing Normal University, Chongqing 401331 (China); Deng, Hua [State Key Joint Laboratory of Environment Simulation and Pollution Control, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085 (China); Ni, Zilin [Department of Scientific Research Management, Chongqing Normal University, Chongqing 401331 (China); Dong, Fan, E-mail: dfctbu@126.com [Chongqing Key Laboratory of Catalysis and Functional Organic Molecules, College of Environment and Resources, Engineering Research Center for Waste Oil Recovery Technology and Equipment of Ministry of Education, College of Environment and Resources, Chongqing Technology and Business University, Chongqing 40067 (China); Zhang, Yuxin, E-mail: zhangyuxin@cqu.edu.cn [College of Materials Science and Engineering, National Key Laboratory of Fundamental Science of Micro/Nano-Devices and System Technology, Chongqing University, Chongqing 400044 (China)

    2017-01-15

    Graphical abstract: The cationic Ag clusters-grafted (BiO){sub 2}CO{sub 3} hierarchical superstructures exhibits highly enhanced visible light photocatalytic air purification through an interfacial charge transfer process induced by Ag clusters. - Highlights: • Microstructural optimization and surface cluster-grafting were firstly combined. • Cationic Ag clusters were grafted on the surface of (BiO){sub 2}CO{sub 3} superstructures. • The Ag clusters-grafted BHS displayed enhanced visible light photocatalysis. • Direct interfacial charge transfer (IFCT) from BHS to Ag clusters was proposed. • The charge transfer process and the dominant reactive species were revealed. - Abstract: A facile method was developed to graft cationic Ag clusters on (BiO){sub 2}CO{sub 3} hierarchical superstructures (BHS) surface to improve their visible light activity. Significantly, the resultant Ag clusters-grafted BHS displayed a highly enhanced visible light photocatalytic performance for NOx removal due to the direct interfacial charge transfer (IFCT) from BHS to Ag clusters. The chemical and coordination state of the cationic Ag clusters was determined with the extended X-ray absorption fine structure (EXAFS) and a theoretical structure model was proposed for this unique Ag clusters. The charge transfer process and the dominant reactive species (·OH) were revealed on the basis of electron spin resonance (ESR) trapping. A new photocatalysis mechanism of Ag clusters-grafted BHS under visible light involving IFCT process was uncovered. In addition, the cationic Ag clusters-grafted BHS also demonstrated high photochemical and structural stability under repeated photocatalysis runs. The perspective of enhancing photocatalysis through combination of microstructural optimization and IFCT could provide a new avenue for the developing efficient visible light photocatalysts.

  8. High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Directory of Open Access Journals (Sweden)

    Dieter Hendricks

    2016-02-01

    Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.

  9. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Xia Hanyu

    2016-01-01

    Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.

  10. Study on Data Clustering and Intelligent Decision Algorithm of Indoor Localization

    Science.gov (United States)

    Liu, Zexi

    2018-01-01

    Indoor positioning technology enables the human beings to have the ability of positional perception in architectural space, and there is a shortage of single network coverage and the problem of location data redundancy. So this article puts forward the indoor positioning data clustering algorithm and intelligent decision-making research, design the basic ideas of multi-source indoor positioning technology, analyzes the fingerprint localization algorithm based on distance measurement, position and orientation of inertial device integration. By optimizing the clustering processing of massive indoor location data, the data normalization pretreatment, multi-dimensional controllable clustering center and multi-factor clustering are realized, and the redundancy of locating data is reduced. In addition, the path is proposed based on neural network inference and decision, design the sparse data input layer, the dynamic feedback hidden layer and output layer, low dimensional results improve the intelligent navigation path planning.

  11. A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network.

    Science.gov (United States)

    Chen, Yuzhong; Weng, Shining; Guo, Wenzhong; Xiong, Naixue

    2016-02-19

    Vehicular ad hoc networks (VANETs) have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.

  12. A Game Theory Algorithm for Intra-Cluster Data Aggregation in a Vehicular Ad Hoc Network

    Directory of Open Access Journals (Sweden)

    Yuzhong Chen

    2016-02-01

    Full Text Available Vehicular ad hoc networks (VANETs have an important role in urban management and planning. The effective integration of vehicle information in VANETs is critical to traffic analysis, large-scale vehicle route planning and intelligent transportation scheduling. However, given the limitations in the precision of the output information of a single sensor and the difficulty of information sharing among various sensors in a highly dynamic VANET, effectively performing data aggregation in VANETs remains a challenge. Moreover, current studies have mainly focused on data aggregation in large-scale environments but have rarely discussed the issue of intra-cluster data aggregation in VANETs. In this study, we propose a multi-player game theory algorithm for intra-cluster data aggregation in VANETs by analyzing the competitive and cooperative relationships among sensor nodes. Several sensor-centric metrics are proposed to measure the data redundancy and stability of a cluster. We then study the utility function to achieve efficient intra-cluster data aggregation by considering both data redundancy and cluster stability. In particular, we prove the existence of a unique Nash equilibrium in the game model, and conduct extensive experiments to validate the proposed algorithm. Results demonstrate that the proposed algorithm has advantages over typical data aggregation algorithms in both accuracy and efficiency.

  13. A harmony search algorithm for clustering with feature selection

    Directory of Open Access Journals (Sweden)

    Carlos Cobos

    2010-01-01

    Full Text Available En este artículo se presenta un nuevo algoritmo de clustering denominado IHSK, con la capacidad de seleccionar características en un orden de complejidad lineal. El algoritmo es inspirado en la combinación de los algoritmos de búsqueda armónica y K-means. Para la selección de las características se usó el concepto de variabilidad y un método heurístico que penaliza la presencia de dimensiones con baja probabilidad de aportar en la solución actual. El algoritmo fue probado con conjuntos de datos sintéticos y reales, obteniendo resultados prometedores.

  14. Load power device and system for real-time execution of hierarchical load identification algorithms

    Science.gov (United States)

    Yang, Yi; Madane, Mayura Arun; Zambare, Prachi Suresh

    2017-11-14

    A load power device includes a power input; at least one power output for at least one load; and a plurality of sensors structured to sense voltage and current at the at least one power output. A processor is structured to provide real-time execution of: (a) a plurality of load identification algorithms, and (b) event detection and operating mode detection for the at least one load.

  15. Segmentation of Mushroom and Cap width Measurement using Modified K-Means Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Eser Sert

    2014-01-01

    Full Text Available Mushroom is one of the commonly consumed foods. Image processing is one of the effective way for examination of visual features and detecting the size of a mushroom. We developed software for segmentation of a mushroom in a picture and also to measure the cap width of the mushroom. K-Means clustering method is used for the process. K-Means is one of the most successful clustering methods. In our study we customized the algorithm to get the best result and tested the algorithm. In the system, at first mushroom picture is filtered, histograms are balanced and after that segmentation is performed. Results provided that customized algorithm performed better segmentation than classical K-Means algorithm. Tests performed on the designed software showed that segmentation on complex background pictures is performed with high accuracy, and 20 mushrooms caps are measured with 2.281 % relative error.

  16. Optimal Selection of Clustering Algorithm via Multi-Criteria Decision Analysis (MCDA for Load Profiling Applications

    Directory of Open Access Journals (Sweden)

    Ioannis P. Panapakidis

    2018-02-01

    Full Text Available Due to high implementation rates of smart meter systems, considerable amount of research is placed in machine learning tools for data handling and information retrieval. A key tool in load data processing is clustering. In recent years, a number of researches have proposed different clustering algorithms in the load profiling field. The present paper provides a methodology for addressing the aforementioned problem through Multi-Criteria Decision Analysis (MCDA and namely, using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS. A comparison of the algorithms is employed. Next, a single test case on the selection of an algorithm is examined. User specific weights are applied and based on these weight values, the optimal algorithm is drawn.

  17. An adaptive clustering algorithm for image matching based on corner feature

    Science.gov (United States)

    Wang, Zhe; Dong, Min; Mu, Xiaomin; Wang, Song

    2018-04-01

    The traditional image matching algorithm always can not balance the real-time and accuracy better, to solve the problem, an adaptive clustering algorithm for image matching based on corner feature is proposed in this paper. The method is based on the similarity of the matching pairs of vector pairs, and the adaptive clustering is performed on the matching point pairs. Harris corner detection is carried out first, the feature points of the reference image and the perceived image are extracted, and the feature points of the two images are first matched by Normalized Cross Correlation (NCC) function. Then, using the improved algorithm proposed in this paper, the matching results are clustered to reduce the ineffective operation and improve the matching speed and robustness. Finally, the Random Sample Consensus (RANSAC) algorithm is used to match the matching points after clustering. The experimental results show that the proposed algorithm can effectively eliminate the most wrong matching points while the correct matching points are retained, and improve the accuracy of RANSAC matching, reduce the computation load of whole matching process at the same time.

  18. Self-similar hierarchical energetics in the ICM of massive galaxy clusters

    Science.gov (United States)

    Miniati, Francesco; Beresnyak, Andrey

    Massive galaxy clusters (GC) are filled with a hot, turbulent and magnetised intra-cluster medium (ICM). They are still forming under the action of gravitational instability, which drives supersonic mass accretion flows. These partially dissipate into heat through a complex network of large scale shocks, and partly excite giant turbulent eddies and cascade. Turbulence dissipation not only contributes to heating of the ICM but also amplifies magnetic energy by way of dynamo action. The pattern of gravitational energy turning into kinetic, thermal, turbulent and magnetic is a fundamental feature of GC hydrodynamics but quantitative modelling has remained a challenge. In this contribution we present results from a recent high resolution, fully cosmological numerical simulation of a massive Coma-like galaxy cluster in which the time dependent turbulent motions of the ICM are resolved (Miniati 2014) and their statistical properties are quantified for the first time (Miniati 2015, Beresnyak & Miniati 2015). We combine these results with independent state-of-the art numerical simulations of MHD turbulence (Beresnyak 2012), which shows that in the nonlinear regime of turbulent dynamo (for magnetic Prandtl numbers>~ 1) the growth rate of the magnetic energy corresponds to a fraction CE ~= 4 - 5 × 10-2 of the turbulent dissipation rate. We thus determine without adjustable parameters the thermal, turbulent and magnetic history of giant GC (Miniati & Beresnyak 2015). We find that the energy components of the ICM are ordered according to a permanent hierarchy, in which the sonic Mach number at the turbulent injection scale is of order unity, the beta of the plasma of order forty and the ratio of turbulent injection scale to Alfvén scale is of order one hundred. These dimensionless numbers remain virtually unaltered throughout the cluster's history, despite evolution of each individual component and the drive towards equipartition of the turbulent dynamo, thus revealing a new

  19. A novel vehicle dynamics stability control algorithm based on the hierarchical strategy with constrain of nonlinear tyre forces

    Science.gov (United States)

    Li, Liang; Jia, Gang; Chen, Jie; Zhu, Hongjun; Cao, Dongpu; Song, Jian

    2015-08-01

    Direct yaw moment control (DYC), which differentially brakes the wheels to produce a yaw moment for the vehicle stability in a steering process, is an important part of electric stability control system. In this field, most control methods utilise the active brake pressure with a feedback controller to adjust the braked wheel. However, the method might lead to a control delay or overshoot because of the lack of a quantitative project relationship between target values from the upper stability controller to the lower pressure controller. Meanwhile, the stability controller usually ignores the implementing ability of the tyre forces, which might be restrained by the combined-slip dynamics of the tyre. Therefore, a novel control algorithm of DYC based on the hierarchical control strategy is brought forward in this paper. As for the upper controller, a correctional linear quadratic regulator, which not only contains feedback control but also contains feed forward control, is introduced to deduce the object of the stability yaw moment in order to guarantee the yaw rate and side-slip angle stability. As for the medium and lower controller, the quantitative relationship between the vehicle stability object and the target tyre forces of controlled wheels is proposed to achieve smooth control performance based on a combined-slip tyre model. The simulations with the hardware-in-the-loop platform validate that the proposed algorithm can improve the stability of the vehicle effectively.

  20. ScreenBEAM: a novel meta-analysis algorithm for functional genomics screens via Bayesian hierarchical modeling.

    Science.gov (United States)

    Yu, Jiyang; Silva, Jose; Califano, Andrea

    2016-01-15

    Functional genomics (FG) screens, using RNAi or CRISPR technology, have become a standard tool for systematic, genome-wide loss-of-function studies for therapeutic target discovery. As in many large-scale assays, however, off-target effects, variable reagents' potency and experimental noise must be accounted for appropriately control for false positives. Indeed, rigorous statistical analysis of high-throughput FG screening data remains challenging, particularly when integrative analyses are used to combine multiple sh/sgRNAs targeting the same gene in the library. We use large RNAi and CRISPR repositories that are publicly available to evaluate a novel meta-analysis approach for FG screens via Bayesian hierarchical modeling, Screening Bayesian Evaluation and Analysis Method (ScreenBEAM). Results from our analysis show that the proposed strategy, which seamlessly combines all available data, robustly outperforms classical algorithms developed for microarray data sets as well as recent approaches designed for next generation sequencing technologies. Remarkably, the ScreenBEAM algorithm works well even when the quality of FG screens is relatively low, which accounts for about 80-95% of the public datasets. R package and source code are available at: https://github.com/jyyu/ScreenBEAM. ac2248@columbia.edu, jose.silva@mssm.edu, yujiyang@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Clustering of resting state networks.

    Directory of Open Access Journals (Sweden)

    Megan H Lee

    Full Text Available The goal of the study was to demonstrate a hierarchical structure of resting state activity in the healthy brain using a data-driven clustering algorithm.The fuzzy-c-means clustering algorithm was applied to resting state fMRI data in cortical and subcortical gray matter from two groups acquired separately, one of 17 healthy individuals and the second of 21 healthy individuals. Different numbers of clusters and different starting conditions were used. A cluster dispersion measure determined the optimal numbers of clusters. An inner product metric provided a measure of similarity between different clusters. The two cluster result found the task-negative and task-positive systems. The cluster dispersion measure was minimized with seven and eleven clusters. Each of the clusters in the seven and eleven cluster result was associated with either the task-negative or task-positive system. Applying the algorithm to find seven clusters recovered previously described resting state networks, including the default mode network, frontoparietal control network, ventral and dorsal attention networks, somatomotor, visual, and language networks. The language and ventral attention networks had significant subcortical involvement. This parcellation was consistently found in a large majority of algorithm runs under different conditions and was robust to different methods of initialization.The clustering of resting state activity using different optimal numbers of clusters identified resting state networks comparable to previously obtained results. This work reinforces the observation that resting state networks are hierarchically organized.

  2. An improved K-means clustering algorithm in agricultural image segmentation

    Science.gov (United States)

    Cheng, Huifeng; Peng, Hui; Liu, Shanmei

    Image segmentation is the first important step to image analysis and image processing. In this paper, according to color crops image characteristics, we firstly transform the color space of image from RGB to HIS, and then select proper initial clustering center and cluster number in application of mean-variance approach and rough set theory followed by clustering calculation in such a way as to automatically segment color component rapidly and extract target objects from background accurately, which provides a reliable basis for identification, analysis, follow-up calculation and process of crops images. Experimental results demonstrate that improved k-means clustering algorithm is able to reduce the computation amounts and enhance precision and accuracy of clustering.

  3. Clustering and Candidate Motif Detection in Exosomal miRNAs by Application of Machine Learning Algorithms.

    Science.gov (United States)

    Gaur, Pallavi; Chaturvedi, Anoop

    2017-07-22

    The clustering pattern and motifs give immense information about any biological data. An application of machine learning algorithms for clustering and candidate motif detection in miRNAs derived from exosomes is depicted in this paper. Recent progress in the field of exosome research and more particularly regarding exosomal miRNAs has led much bioinformatic-based research to come into existence. The information on clustering pattern and candidate motifs in miRNAs of exosomal origin would help in analyzing existing, as well as newly discovered miRNAs within exosomes. Along with obtaining clustering pattern and candidate motifs in exosomal miRNAs, this work also elaborates the usefulness of the machine learning algorithms that can be efficiently used and executed on various programming languages/platforms. Data were clustered and sequence candidate motifs were detected successfully. The results were compared and validated with some available web tools such as 'BLASTN' and 'MEME suite'. The machine learning algorithms for aforementioned objectives were applied successfully. This work elaborated utility of machine learning algorithms and language platforms to achieve the tasks of clustering and candidate motif detection in exosomal miRNAs. With the information on mentioned objectives, deeper insight would be gained for analyses of newly discovered miRNAs in exosomes which are considered to be circulating biomarkers. In addition, the execution of machine learning algorithms on various language platforms gives more flexibility to users to try multiple iterations according to their requirements. This approach can be applied to other biological data-mining tasks as well.

  4. An Effective Tri-Clustering Algorithm Combining Expression Data with Gene Regulation Information

    Directory of Open Access Journals (Sweden)

    Ao Li

    2009-04-01

    Full Text Available Motivation: Bi-clustering algorithms aim to identify sets of genes sharing similar expression patterns across a subset of conditions. However direct interpretation or prediction of gene regulatory mechanisms may be difficult as only gene expression data is used. Information about gene regulators may also be available, most commonly about which transcription factors may bind to the promoter region and thus control the expression level of a gene. Thus a method to integrate gene expression and gene regulation information is desirable for clustering and analyzing. Methods: By incorporating gene regulatory information with gene expression data, we define regulated expression values (REV as indicators of how a gene is regulated by a specific factor. Existing bi-clustering methods are extended to a three dimensional data space by developing a heuristic TRI-Clustering algorithm. An additional approach named Automatic Boundary Searching algorithm (ABS is introduced to automatically determine the boundary threshold. Results: Results based on incorporating ChIP-chip data representing transcription factor-gene interactions show that the algorithms are efficient and robust for detecting tri-clusters. Detailed analysis of the tri-cluster extracted from yeast sporulation REV data shows genes in this cluster exhibited significant differences during the middle and late stages. The implicated regulatory network was then reconstructed for further study of defined regulatory mechanisms. Topological and statistical analysis of this network demonstrated evidence of significant changes of TF activities during the different stages of yeast sporulation, and suggests this approach might be a general way to study regulatory networks undergoing transformations.

  5. Chaos control of ferroresonance system based on RBF-maximum entropy clustering algorithm

    International Nuclear Information System (INIS)

    Liu Fan; Sun Caixin; Sima Wenxia; Liao Ruijin; Guo Fei

    2006-01-01

    With regards to the ferroresonance overvoltage of neutral grounded power system, a maximum-entropy learning algorithm based on radial basis function neural networks is used to control the chaotic system. The algorithm optimizes the object function to derive learning rule of central vectors, and uses the clustering function of network hidden layers. It improves the regression and learning ability of neural networks. The numerical experiment of ferroresonance system testifies the effectiveness and feasibility of using the algorithm to control chaos in neutral grounded system

  6. Data Clustering

    Science.gov (United States)

    Wagstaff, Kiri L.

    2012-03-01

    clustering, in which some partial information about item assignments or other components of the resulting output are already known and must be accommodated by the solution. Some algorithms seek a partition of the data set into distinct clusters, while others build a hierarchy of nested clusters that can capture taxonomic relationships. Some produce a single optimal solution, while others construct a probabilistic model of cluster membership. More formally, clustering algorithms operate on a data set X composed of items represented by one or more features (dimensions). These could include physical location, such as right ascension and declination, as well as other properties such as brightness, color, temporal change, size, texture, and so on. Let D be the number of dimensions used to represent each item, xi ∈ RD. The clustering goal is to produce an organization P of the items in X that optimizes an objective function f : P -> R, which quantifies the quality of solution P. Often f is defined so as to maximize similarity within a cluster and minimize similarity between clusters. To that end, many algorithms make use of a measure d : X x X -> R of the distance between two items. A partitioning algorithm produces a set of clusters P = {c1, . . . , ck} such that the clusters are nonoverlapping (c_i intersected with c_j = empty set, i != j) subsets of the data set (Union_i c_i=X). Hierarchical algorithms produce a series of partitions P = {p1, . . . , pn }. For a complete hierarchy, the number of partitions n’= n, the number of items in the data set; the top partition is a single cluster containing all items, and the bottom partition contains n clusters, each containing a single item. For model-based clustering, each cluster c_j is represented by a model m_j , such as the cluster center or a Gaussian distribution. The wide array of available clustering algorithms may seem bewildering, and covering all of them is beyond the scope of this chapter. Choosing among them for a

  7. A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks

    Science.gov (United States)

    Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip

    2013-01-01

    Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855

  8. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  9. Performance of a Real-time Multipurpose 2-Dimensional Clustering Algorithm Developed for the ATLAS Experiment

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00372074; The ATLAS collaboration; Sotiropoulou, Calliope Louisa; Annovi, Alberto; Kordas, Kostantinos

    2016-01-01

    In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avail...

  10. Performance of a Real-time Multipurpose 2-Dimensional Clustering Algorithm Developed for the ATLAS Experiment

    CERN Document Server

    Gkaitatzis, Stamatios; The ATLAS collaboration

    2016-01-01

    In this paper the performance of the 2D pixel clustering algorithm developed for the Input Mezzanine card of the ATLAS Fast TracKer system is presented. Fast TracKer is an approved ATLAS upgrade that has the goal to provide a complete list of tracks to the ATLAS High Level Trigger for each level-1 accepted event, at up to 100 kHz event rate with a very small latency, in the order of 100 µs. The Input Mezzanine card is the input stage of the Fast TracKer system. Its role is to receive data from the silicon detector and perform real time clustering, thus to reduce the amount of data propagated to the subsequent processing levels with minimal information loss. We focus on the most challenging component on the Input Mezzanine card, the 2D clustering algorithm executed on the pixel data. We compare two different implementations of the algorithm. The first is one called the ideal one which searches clusters of pixels in the whole silicon module at once and calculates the cluster centroids exploiting the whole avai...

  11. An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

    OpenAIRE

    Khan, Fouad

    2016-01-01

    K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though -such as when classifying georeferenced data for mapping applications- standardization of clustering methodolo...

  12. FRCA: A Fuzzy Relevance-Based Cluster Head Selection Algorithm for Wireless Mobile Ad-Hoc Sensor Networks

    Directory of Open Access Journals (Sweden)

    Taegwon Jeong

    2011-05-01

    Full Text Available Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP, the Weighted-based Adaptive Clustering Algorithm (WACA, and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM. The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.

  13. FRCA: a fuzzy relevance-based cluster head selection algorithm for wireless mobile ad-hoc sensor networks.

    Science.gov (United States)

    Lee, Chongdeuk; Jeong, Taegwon

    2011-01-01

    Clustering is an important mechanism that efficiently provides information for mobile nodes and improves the processing capacity of routing, bandwidth allocation, and resource management and sharing. Clustering algorithms can be based on such criteria as the battery power of nodes, mobility, network size, distance, speed and direction. Above all, in order to achieve good clustering performance, overhead should be minimized, allowing mobile nodes to join and leave without perturbing the membership of the cluster while preserving current cluster structure as much as possible. This paper proposes a Fuzzy Relevance-based Cluster head selection Algorithm (FRCA) to solve problems found in existing wireless mobile ad hoc sensor networks, such as the node distribution found in dynamic properties due to mobility and flat structures and disturbance of the cluster formation. The proposed mechanism uses fuzzy relevance to select the cluster head for clustering in wireless mobile ad hoc sensor networks. In the simulation implemented on the NS-2 simulator, the proposed FRCA is compared with algorithms such as the Cluster-based Routing Protocol (CBRP), the Weighted-based Adaptive Clustering Algorithm (WACA), and the Scenario-based Clustering Algorithm for Mobile ad hoc networks (SCAM). The simulation results showed that the proposed FRCA achieves better performance than that of the other existing mechanisms.

  14. Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, Priscilla R.; Munita, Casimiro S.; Lapolli, André L., E-mail: prii.ramos@gmail.com, E-mail: camunita@ipen.br, E-mail: alapolli@ipen.br [Instituto de Pesquisas Energéticas e Nucleares (IPEN/CNEN-SP), São Paulo, SP (Brazil)

    2017-07-01

    The literature presents many methods for partitioning of data base, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data base. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data base of the Archaeometric Studies Group from IPEN-CNEN/SP, in which 45 samples of ceramic fragments from three archaeological sites were analyzed by instrumental neutron activation analysis (INAA) which were determinate the mass fraction of 13 elements (As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, U). The methods used for this study were: single linkage, complete linkage, average linkage, centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data base. (author)

  15. Validity studies among hierarchical methods of cluster analysis using cophenetic correlation coefficient

    International Nuclear Information System (INIS)

    Carvalho, Priscilla R.; Munita, Casimiro S.; Lapolli, André L.

    2017-01-01

    The literature presents many methods for partitioning of data base, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data base. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data base of the Archaeometric Studies Group from IPEN-CNEN/SP, in which 45 samples of ceramic fragments from three archaeological sites were analyzed by instrumental neutron activation analysis (INAA) which were determinate the mass fraction of 13 elements (As, Ce, Cr, Eu, Fe, Hf, La, Na, Nd, Sc, Sm, Th, U). The methods used for this study were: single linkage, complete linkage, average linkage, centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data base. (author)

  16. MixSim : An R Package for Simulating Data to Study Performance of Clustering Algorithms

    Directory of Open Access Journals (Sweden)

    Volodymyr Melnykov

    2012-11-01

    Full Text Available The R package MixSim is a new tool that allows simulating mixtures of Gaussian distributions with different levels of overlap between mixture components. Pairwise overlap, defined as a sum of two misclassification probabilities, measures the degree of interaction between components and can be readily employed to control the clustering complexity of datasets simulated from mixtures. These datasets can then be used for systematic performance investigation of clustering and finite mixture modeling algorithms. Among other capabilities of MixSim, there are computing the exact overlap for Gaussian mixtures, simulating Gaussian and non-Gaussian data, simulating outliers and noise variables, calculating various measures of agreement between two partitionings, and constructing parallel distribution plots for the graphical display of finite mixture models. All features of the package are illustrated in great detail. The utility of the package is highlighted through a small comparison study of several popular clustering algorithms.

  17. An improved optimum-path forest clustering algorithm for remote sensing image segmentation

    Science.gov (United States)

    Chen, Siya; Sun, Tieli; Yang, Fengqin; Sun, Hongguang; Guan, Yu

    2018-03-01

    Remote sensing image segmentation is a key technology for processing remote sensing images. The image segmentation results can be used for feature extraction, target identification and object description. Thus, image segmentation directly affects the subsequent processing results. This paper proposes a novel Optimum-Path Forest (OPF) clustering algorithm that can be used for remote sensing segmentation. The method utilizes the principle that the cluster centres are characterized based on their densities and the distances between the centres and samples with higher densities. A new OPF clustering algorithm probability density function is defined based on this principle and applied to remote sensing image segmentation. Experiments are conducted using five remote sensing land cover images. The experimental results illustrate that the proposed method can outperform the original OPF approach.

  18. HIERARCHICAL FRAGMENTATION AND JET-LIKE OUTFLOWS IN IRDC G28.34+0.06: A GROWING MASSIVE PROTOSTAR CLUSTER

    International Nuclear Information System (INIS)

    Wang Ke; Wu Yuefang; Zhang Huawei; Zhang Qizhou

    2011-01-01

    We present Submillimeter Array (SMA) λ = 0.88 mm observations of an infrared dark cloud G28.34+0.06. Located in the quiescent southern part of the G28.34 cloud, the region of interest is a massive (>10 3 M sun ) molecular clump P1 with a luminosity of ∼10 3 L sun , where our previous SMA observations at 1.3 mm have revealed a string of five dust cores of 22-64 M sun along the 1 pc IR-dark filament. The cores are well aligned at a position angle (P.A.) of 48 deg. and regularly spaced at an average projected separation of 0.16 pc. The new high-resolution, high-sensitivity 0.88 mm image further resolves the five cores into 10 compact condensations of 1.4-10.6 M sun , with sizes of a few thousand AU. The spatial structure at clump (∼1 pc) and core (∼0.1 pc) scales indicates a hierarchical fragmentation. While the clump fragmentation is consistent with a cylindrical collapse, the observed fragment masses are much larger than the expected thermal Jeans masses. All the cores are driving CO (3-2) outflows up to 38 km s -1 , the majority of which are bipolar, jet-like outflows. The moderate luminosity of the P1 clump sets a limit on the mass of protostars of 3-7 M sun . Because of the large reservoir of dense molecular gas in the immediate medium and ongoing accretion as evident by the jet-like outflows, we speculate that P1 will grow and eventually form a massive star cluster. This study provides a first glimpse of massive, clustered star formation that currently undergoes through an intermediate-mass stage.

  19. Assessment of the quality of water by hierarchical cluster and variance analyses of the Koudiat Medouar Watershed, East Algeria

    Science.gov (United States)

    Tiri, Ammar; Lahbari, Noureddine; Boudoukha, Abderrahmane

    2017-12-01

    The assessment of surface water in Koudiat Medouar watershed is very important especially when it comes to pollution of the dam waters by discharges of wastewater from neighboring towns in Oued Timgad, who poured into the basin of the dam, and agricultural lands located along the Oued Reboa. To this end, the multivariable method was used to evaluate the spatial and temporal variation of the water surface quality of the Koudiat Medouar dam, eastern Algeria. The stiff diagram has identified two main hydrochemical facies. The first facies Mg-HCO3 is reflected in the first sampling station (Oued Reboa) and in the second one (Oued Timgad), while the second facies Mg-SO4 is reflected in the third station (Basin Dam). The results obtained by the analysis of variance show that in the three stations all parameters are significant, except for Na, K and HCO3 in the first station (Oued Reboa) and the EC in the second station (Oued Timgad) and at the end NO3 and pH in the third station (Basin Dam). Q-mode hierarchical cluster analysis showed that two main groups in each sampling station. The chemistry of major ions (Mg, Ca, HCO3 and SO4) within the three stations results from anthropogenic impacts and water-rock interaction sources.

  20. Typing of unknown microorganisms based on quantitative analysis of fatty acids by mass spectrometry and hierarchical clustering

    Energy Technology Data Exchange (ETDEWEB)

    Li Tingting; Dai Ling; Li Lun; Hu Xuejiao; Dong Linjie; Li Jianjian; Salim, Sule Khalfan; Fu Jieying [Key Laboratory of Pesticides and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei 430079 (China); Zhong Hongying, E-mail: hyzhong@mail.ccnu.edu.cn [Key Laboratory of Pesticides and Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan, Hubei 430079 (China)

    2011-01-17

    Rapid identification of unknown microorganisms of clinical and agricultural importance is not only critical for accurate diagnosis of infections but also essential for appropriate and prompt treatment. We describe here a rapid method for microorganisms typing based on quantitative analysis of fatty acids by iFAT approach (Isotope-coded Fatty Acid Transmethylation). In this work, lyophilized cell lysates were directly mixed with 0.5 M NaOH solution in d3-methanol and n-hexane. After 1 min of ultrasonication, the top n-hexane layer was combined with a mixture of standard d0-methanol derived fatty acid methylesters with known concentration. Measurement of intensity ratios of d3/d0 labeled fragment ion and molecular ion pairs at the corresponding target fatty acids provides a quantitative basis for hierarchical clustering. In the resultant dendrogram, the Euclidean distance between unknown species and known species quantitatively reveals their differences or shared similarities in fatty acid related pathways. It is of particular interest to apply this method for typing fungal species because fungi has distinguished lipid biosynthetic pathways that have been targeted for lots of drugs or fungicides compared with bacteria and animals. The proposed method has no dependence on the availability of genome or proteome databases. Therefore, it is can be applicable for a broad range of unknown microorganisms or mutant species.

  1. Peringkasan Tweet Berdasarkan Trending Topic Twitter Dengan Pembobotan TF-IDF dan Single Linkage AngglomerativeHierarchical Clustering

    Directory of Open Access Journals (Sweden)

    Annisa Annisa

    2016-10-01

    Full Text Available Trending topic is a feature provided by twitter that informs something widely discussed by users in a particular time. The form of a trending topic is a hashtag and can be selected by clicking. However, the number of tweets for each trending topics can be very large, so it will be difficult if we want to know all the contents. So, in order to make easy when reading the topic, a small number of tweets can be selected as the main idea of the topic. In this study, we applied the Agglomerative Single Linkage Hierarchical Clustering by calculating the TF-IDF value for each word in advance. We used 100 trending topics, where each topic consists of 50 tweets in Indonesian. For testing, we provided 30 trending topics which consist of 2 until 9 sub-topics. The result is that each trending topics can be summarized into shorter text contains 2 until 9 tweets. We were able to summarize 1 trending topics exactly same as the topic summarized by human expert. However, the rest of topics corresponded partially with human expert.

  2. Hierarchical clustering of Alzheimer and 'normal' brains using elemental concentrations and glucose metabolism determined by PIXE, INAA and PET

    International Nuclear Information System (INIS)

    Cutts, D.A.; Spyrou, N.M.

    2001-01-01

    Brain tissue samples, obtained from the Alzheimer Disease Brain Bank, Institute of Psychiatry, London, were taken from both left and right hemispheres of three regions of the cerebrum, namely the frontal, parietal and occipital lobes for both Alzheimer and 'normal' subjects. Trace element concentrations in the frontal lobe were determined for twenty six Alzheimer (15 male, 11 female) and twenty six 'normal' (8 male, 18 female) brain tissue samples. In the parietal lobe ten Alzheimer (2 male, 8 female) and ten 'normal' (8 male, 2 female) samples were taken along with ten Alzheimer (4 male, 6 female) and ten 'normal' (6 male, 4 female) from the occipital lobe. For the frontal lobe trace element concentrations were determined using proton induced X-ray emission (PIXE) analysis while in parietal and occipital regions instrumental neutron activation analysis (INAA) was used. Additionally eighteen Alzheimer (9 male, 9 female) and eighteen age matched 'normal' (8 male, 10 female) living subjects were examined using positron emission tomography (PET) in order to determine regional cerebral metabolic rates of glucose (rCMRGlu). The rCMRGlu of 36 regions of the brain was investigated including frontal, occipital and parietal lobes as in the trace element study. Hierarchical cluster analysis was applied to the trace element and glucose metabolism data to discover which variables in the resulting dendrograms displayed the most significant separation between Alzheimer and 'normal' subjects. (author)

  3. Extended hierarchical search (EHS) algorithm for detection of gravitational waves from inspiralling compact binaries

    CERN Document Server

    Sengupta, A S; Lazzarini, A; Prince, T

    2002-01-01

    Pattern matching techniques such as matched filtering will be used for online extraction of gravitational wave signals buried inside detector noise. This involves cross correlating the detector output with hundreds of thousands of templates spanning a multi-dimensional parameter space, which is very expensive computationally. A faster implementation algorithm was devised by Mohanty and Dhurandhar using a hierarchy of templates over the mass parameters, which speeded up the procedure by about 25-30 times. We show that a further reduction in computational cost is possible if we extend the hierarchy paradigm to an extra parameter, namely, the time of arrival of the signal. In the first stage, the chirp waveform is cut-off at a relatively low frequency allowing the data to be coarsely sampled leading to cost saving in performing the FFTs. This is possible because most of the signal power is at low frequencies, and therefore the advantage due to hierarchy over masses is not compromised. Results are obtained for sp...

  4. Low-Complexity Hierarchical Mode Decision Algorithms Targeting VLSI Architecture Design for the H.264/AVC Video Encoder

    Directory of Open Access Journals (Sweden)

    Guilherme Corrêa

    2012-01-01

    Full Text Available In H.264/AVC, the encoding process can occur according to one of the 13 intraframe coding modes or according to one of the 8 available interframes block sizes, besides the SKIP mode. In the Joint Model reference software, the choice of the best mode is performed through exhaustive executions of the entire encoding process, which significantly increases the encoder's computational complexity and sometimes even forbids its use in real-time applications. Considering this context, this work proposes a set of heuristic algorithms targeting hardware architectures that lead to earlier selection of one encoding mode. The amount of repetitions of the encoding process is reduced by 47 times, at the cost of a relatively small cost in compression performance. When compared to other works, the fast hierarchical mode decision results are expressively more satisfactory in terms of computational complexity reduction, quality, and bit rate. The low-complexity mode decision architecture proposed is thus a very good option for real-time coding of high-resolution videos. The solution is especially interesting for embedded and mobile applications with support to multimedia systems, since it yields good compression rates and image quality with a very high reduction in the encoder complexity.

  5. A priori data-driven multi-clustered reservoir generation algorithm for echo state network.

    Directory of Open Access Journals (Sweden)

    Xiumin Li

    Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.

  6. Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.

    Science.gov (United States)

    Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong

    2017-07-03

    Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.

  7. Artificial Bee Colony Algorithm Based on K-Means Clustering for Multiobjective Optimal Power Flow Problem

    Directory of Open Access Journals (Sweden)

    Liling Sun

    2015-01-01

    Full Text Available An improved multiobjective ABC algorithm based on K-means clustering, called CMOABC, is proposed. To fasten the convergence rate of the canonical MOABC, the way of information communication in the employed bees’ phase is modified. For keeping the population diversity, the multiswarm technology based on K-means clustering is employed to decompose the population into many clusters. Due to each subcomponent evolving separately, after every specific iteration, the population will be reclustered to facilitate information exchange among different clusters. Application of the new CMOABC on several multiobjective benchmark functions shows a marked improvement in performance over the fast nondominated sorting genetic algorithm (NSGA-II, the multiobjective particle swarm optimizer (MOPSO, and the multiobjective ABC (MOABC. Finally, the CMOABC is applied to solve the real-world optimal power flow (OPF problem that considers the cost, loss, and emission impacts as the objective functions. The 30-bus IEEE test system is presented to illustrate the application of the proposed algorithm. The simulation results demonstrate that, compared to NSGA-II, MOPSO, and MOABC, the proposed CMOABC is superior for solving OPF problem, in terms of optimization accuracy.

  8. A clustering algorithm for sample data based on environmental pollution characteristics

    Science.gov (United States)

    Chen, Mei; Wang, Pengfei; Chen, Qiang; Wu, Jiadong; Chen, Xiaoyun

    2015-04-01

    Environmental pollution has become an issue of serious international concern in recent years. Among the receptor-oriented pollution models, CMB, PMF, UNMIX, and PCA are widely used as source apportionment models. To improve the accuracy of source apportionment and classify the sample data for these models, this study proposes an easy-to-use, high-dimensional EPC algorithm that not only organizes all of the sample data into different groups according to the similarities in pollution characteristics such as pollution sources and concentrations but also simultaneously detects outliers. The main clustering process consists of selecting the first unlabelled point as the cluster centre, then assigning each data point in the sample dataset to its most similar cluster centre according to both the user-defined threshold and the value of similarity function in each iteration, and finally modifying the clusters using a method similar to k-Means. The validity and accuracy of the algorithm are tested using both real and synthetic datasets, which makes the EPC algorithm practical and effective for appropriately classifying sample data for source apportionment models and helpful for better understanding and interpreting the sources of pollution.

  9. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    Science.gov (United States)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  10. Channel processor in 2D cluster finding algorithm for high energy physics application

    International Nuclear Information System (INIS)

    Paul, Rourab; Chakrabarti, Amlan; Mitra, Jubin; Khan, Shuaib A.; Nayak, Tapan; Mukherjee, Sanjoy

    2016-01-01

    In a Large Ion Collider Experiment (ALICE) at CERN 1 TB/s (approximately) data comes from front end electronics. Previously, we had 1 GBT link operated with a cluster clock frequencies of 133 MHz and 320 MHz in Run 1 and Run 2 respectively. The cluster algorithm proposed in Run 1 and 2 could not work in Run 3 as the data speed increased almost 20 times. Older version cluster algorithm receives data sequentially as a stream. It has 2 main sub processes - Channel Processor, Merging process. The initial step of channel processor finds a peak Q max and sums up pads (sensors) data from -2 time bin to +2 time bin in the time direction. The computed value stores in a register named cluster fragment data (cfd o ). The merging process merges cfd o in pad direction. The data streams in Run 2 comes sequentially, which processed by the channel processor and merging block in a sequential manner with very less resource over head. In Run 3 data comes parallely, 1600 data from 1600 pads of a single time instant comes at each 200 ns interval (5 MHz) which is very challenging to process in the budgeted resource platform of Arria 10 FPGA hardware with 250 to 320 MHz cluster clock

  11. Energy Efficient and Safe Weighted Clustering Algorithm for Mobile Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Amine Dahane

    2015-01-01

    Full Text Available The main concern of clustering approaches for mobile wireless sensor networks (WSNs is to prolong the battery life of the individual sensors and the network lifetime. For a successful clustering approach the need of a powerful mechanism to safely elect a cluster head remains a challenging task in many research works that take into account the mobility of the network. The approach based on the computing of the weight of each node in the network is one of the proposed techniques to deal with this problem. In this paper, we propose an energy efficient and safe weighted clustering algorithm (ES-WCA for mobile WSNs using a combination of five metrics. Among these metrics lies the behavioral level metric which promotes a safe choice of a cluster head in the sense where this last one will never be a malicious node. Moreover, the highlight of our work is summarized in a comprehensive strategy for monitoring the network, in order to detect and remove the malicious nodes. We use simulation study to demonstrate the performance of the proposed algorithm.

  12. An algorithm of discovering signatures from DNA databases on a computer cluster.

    Science.gov (United States)

    Lee, Hsiao Ping; Sheu, Tzu-Fang

    2014-10-05

    Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.

  13. Clustering methods and visualization algorithms to aid nuclear reactor operative diagnostics

    International Nuclear Information System (INIS)

    Pepelyshev, Yu.N.; Dzwinel, W.

    1990-01-01

    The software system developed plays the role of the aid to an operator for nuclear reactor diagnostics. The noise analysis of the reactor parameters such as power, temperature and coolant flow rate constitutes the basis of the system. Combination of data acquisition, data preprocessing, clustering and cluster visualization algorithms with heuristic techniques of results analysis, determine the way of its implementation. Two regimes are available. The first one - extended - is recommended for a long term investigations and the second - suppressed for the aid to the reactor operation monitoring. The system has been tested and developed at the JINR IBR-2 pulsed reactor. 13 refs.; 4 figs.; 2 tabs

  14. K-mean clustering algorithm for processing signals from compound semiconductor detectors

    International Nuclear Information System (INIS)

    Tada, Tsutomu; Hitomi, Keitaro; Wu, Yan; Kim, Seong-Yun; Yamazaki, Hiromichi; Ishii, Keizo

    2011-01-01

    The K-mean clustering algorithm was employed for processing signal waveforms from TlBr detectors. The signal waveforms were classified based on its shape reflecting the charge collection process in the detector. The classified signal waveforms were processed individually to suppress the pulse height variation of signals due to the charge collection loss. The obtained energy resolution of a 137 Cs spectrum measured with a 0.5 mm thick TlBr detector was 1.3% FWHM by employing 500 clusters.

  15. What to Do When K-Means Clustering Fails: A Simple yet Principled Alternative Algorithm.

    Science.gov (United States)

    Raykov, Yordan P; Boukouvalas, Alexis; Baig, Fahd; Little, Max A

    The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

  16. Enhancement of RWSN Lifetime via Firework Clustering Algorithm Validated by ANN

    Directory of Open Access Journals (Sweden)

    Ahmad Ali

    2018-03-01

    Full Text Available Nowadays, wireless power transfer is ubiquitously used in wireless rechargeable sensor networks (WSNs. Currently, the energy limitation is a grave concern issue for WSNs. However, lifetime enhancement of sensor networks is a challenging task need to be resolved. For addressing this issue, a wireless charging vehicle is an emerging technology to expand the overall network efficiency. The present study focuses on the enhancement of overall network lifetime of the rechargeable wireless sensor network. To resolve the issues mentioned above, we propose swarm intelligence based hard clustering approach using fireworks algorithm with the adaptive transfer function (FWA-ATF. In this work, the virtual clustering method has been applied in the routing process which utilizes the firework optimization algorithm. Still now, an FWA-ATF algorithm yet not applied by any researcher for RWSN. Furthermore, the validation study of the proposed method using the artificial neural network (ANN backpropagation algorithm incorporated in the present study. Different algorithms are applied to evaluate the performance of proposed technique that gives the best results in this mechanism. Numerical results indicate that our method outperforms existing methods and yield performance up to 80% regarding energy consumption and vacation time of wireless charging vehicle.

  17. [Automatic Sleep Stage Classification Based on an Improved K-means Clustering Algorithm].

    Science.gov (United States)

    Xiao, Shuyuan; Wang, Bei; Zhang, Jian; Zhang, Qunfeng; Zou, Junzhong

    2016-10-01

    Sleep stage scoring is a hotspot in the field of medicine and neuroscience.Visual inspection of sleep is laborious and the results may be subjective to different clinicians.Automatic sleep stage classification algorithm can be used to reduce the manual workload.However,there are still limitations when it encounters complicated and changeable clinical cases.The purpose of this paper is to develop an automatic sleep staging algorithm based on the characteristics of actual sleep data.In the proposed improved K-means clustering algorithm,points were selected as the initial centers by using a concept of density to avoid the randomness of the original K-means algorithm.Meanwhile,the cluster centers were updated according to the‘Three-Sigma Rule’during the iteration to abate the influence of the outliers.The proposed method was tested and analyzed on the overnight sleep data of the healthy persons and patients with sleep disorders after continuous positive airway pressure(CPAP)treatment.The automatic sleep stage classification results were compared with the visual inspection by qualified clinicians and the averaged accuracy reached 76%.With the analysis of morphological diversity of sleep data,it was proved that the proposed improved K-means algorithm was feasible and valid for clinical practice.

  18. Forecasting Jakarta composite index (IHSG) based on chen fuzzy time series and firefly clustering algorithm

    Science.gov (United States)

    Ningrum, R. W.; Surarso, B.; Farikhin; Safarudin, Y. M.

    2018-03-01

    This paper proposes the combination of Firefly Algorithm (FA) and Chen Fuzzy Time Series Forecasting. Most of the existing fuzzy forecasting methods based on fuzzy time series use the static length of intervals. Therefore, we apply an artificial intelligence, i.e., Firefly Algorithm (FA) to set non-stationary length of intervals for each cluster on Chen Method. The method is evaluated by applying on the Jakarta Composite Index (IHSG) and compare with classical Chen Fuzzy Time Series Forecasting. Its performance verified through simulation using Matlab.

  19. Hierarchical Network Design

    DEFF Research Database (Denmark)

    Thomadsen, Tommy

    2005-01-01

    Communication networks are immensely important today, since both companies and individuals use numerous services that rely on them. This thesis considers the design of hierarchical (communication) networks. Hierarchical networks consist of layers of networks and are well-suited for coping...... with changing and increasing demands. Two-layer networks consist of one backbone network, which interconnects cluster networks. The clusters consist of nodes and links, which connect the nodes. One node in each cluster is a hub node, and the backbone interconnects the hub nodes of each cluster and thus...... the clusters. The design of hierarchical networks involves clustering of nodes, hub selection, and network design, i.e. selection of links and routing of ows. Hierarchical networks have been in use for decades, but integrated design of these networks has only been considered for very special types of networks...

  20. Hierarchical clustering of gene expression patterns in the Eomes + lineage of excitatory neurons during early neocortical development

    Directory of Open Access Journals (Sweden)

    Cameron David A

    2012-08-01

    Full Text Available Abstract Background Cortical neurons display dynamic patterns of gene expression during the coincident processes of differentiation and migration through the developing cerebrum. To identify genes selectively expressed by the Eomes + (Tbr2 lineage of excitatory cortical neurons, GFP-expressing cells from Tg(Eomes::eGFP Gsat embryos were isolated to > 99% purity and profiled. Results We report the identification, validation and spatial grouping of genes selectively expressed within the Eomes + cortical excitatory neuron lineage during early cortical development. In these neurons 475 genes were expressed ≥ 3-fold, and 534 genes ≤ 3-fold, compared to the reference population of neuronal precursors. Of the up-regulated genes, 328 were represented at the Genepaint in situ hybridization database and 317 (97% were validated as having spatial expression patterns consistent with the lineage of differentiating excitatory neurons. A novel approach for quantifying in situ hybridization patterns (QISP across the cerebral wall was developed that allowed the hierarchical clustering of genes into putative co-regulated groups. Forty four candidate genes were identified that show spatial expression with Intermediate Precursor Cells, 49 candidate genes show spatial expression with Multipolar Neurons, while the remaining 224 genes achieved peak expression in the developing cortical plate. Conclusions This analysis of differentiating excitatory neurons revealed the expression patterns of 37 transcription factors, many chemotropic signaling molecules (including the Semaphorin, Netrin and Slit signaling pathways, and unexpected evidence for non-canonical neurotransmitter signaling and changes in mechanisms of glucose metabolism. Over half of the 317 identified genes are associated with neuronal disease making these findings a valuable resource for studies of neurological development and disease.

  1. An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density

    Directory of Open Access Journals (Sweden)

    Xiangbing Zhou

    2017-12-01

    Full Text Available Rapidly growing Global Positioning System (GPS data plays an important role in trajectory and their applications (e.g., GPS-enabled smart devices. In order to employ K-means to mine the better origins and destinations (OD behind the GPS data and overcome its shortcomings including slowness of convergence, sensitivity to initial seeds selection, and getting stuck in a local optimum, this paper proposes and focuses on a novel niche genetic algorithm (NGA with density and noise for K-means clustering (NoiseClust. In NoiseClust, an improved noise method and K-means++ are proposed to produce the initial population and capture higher quality seeds that can automatically determine the proper number of clusters, and also handle the different sizes and shapes of genes. A density-based method is presented to divide the number of niches, with its aim to maintain population diversity. Adaptive probabilities of crossover and mutation are also employed to prevent the convergence to a local optimum. Finally, the centers (the best chromosome are obtained and then fed into the K-means as initial seeds to generate even higher quality clustering results by allowing the initial seeds to readjust as needed. Experimental results based on taxi GPS data sets demonstrate that NoiseClust has high performance and effectiveness, and easily mine the city’s situations in four taxi GPS data sets.

  2. Dynamic connectivity algorithms for Monte Carlo simulations of the random-cluster model

    International Nuclear Information System (INIS)

    Elçi, Eren Metin; Weigel, Martin

    2014-01-01

    We review Sweeny's algorithm for Monte Carlo simulations of the random cluster model. Straightforward implementations suffer from the problem of computational critical slowing down, where the computational effort per edge operation scales with a power of the system size. By using a tailored dynamic connectivity algorithm we are able to perform all operations with a poly-logarithmic computational effort. This approach is shown to be efficient in keeping online connectivity information and is of use for a number of applications also beyond cluster-update simulations, for instance in monitoring droplet shape transitions. As the handling of the relevant data structures is non-trivial, we provide a Python module with a full implementation for future reference.

  3. Application of a clustering-based peak alignment algorithm to analyze various DNA fingerprinting data.

    Science.gov (United States)

    Ishii, Satoshi; Kadota, Koji; Senoo, Keishi

    2009-09-01

    DNA fingerprinting analysis such as amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic PCR (rep-PCR), ribosomal intergenic spacer analysis (RISA), and denaturing gradient gel electrophoresis (DGGE) are frequently used in various fields of microbiology. The major difficulty in DNA fingerprinting data analysis is the alignment of multiple peak sets. We report here an R program for a clustering-based peak alignment algorithm, and its application to analyze various DNA fingerprinting data, such as ARDRA, rep-PCR, RISA, and DGGE data. The results obtained by our clustering algorithm and by BioNumerics software showed high similarity. Since several R packages have been established to statistically analyze various biological data, the distance matrix obtained by our R program can be used for subsequent statistical analyses, some of which were not previously performed but are useful in DNA fingerprinting studies.

  4. A New Waveform Signal Processing Method Based on Adaptive Clustering-Genetic Algorithms

    International Nuclear Information System (INIS)

    Noha Shaaban; Fukuzo Masuda; Hidetsugu Morota

    2006-01-01

    We present a fast digital signal processing method for numerical analysis of individual pulses from CdZnTe compound semiconductor detectors. Using Maxi-Mini Distance Algorithm and Genetic Algorithms based discrimination technique. A parametric approach has been used for classifying the discriminated waveforms into a set of clusters each has a similar signal shape with a corresponding pulse height spectrum. A corrected total pulse height spectrum was obtained by applying a normalization factor for the full energy peak for each cluster with a highly improvements in the energy spectrum characteristics. This method applied successfully for both simulated and real measured data, it can be applied to any detector suffers from signal shape variation. (authors)

  5. Clustering for Different Scales of Measurement - the Gap-Ratio Weighted K-means Algorithm

    OpenAIRE

    Guérin, Joris; Gibaru, Olivier; Thiery, Stéphane; Nyiri, Eric

    2017-01-01

    This paper describes a method for clustering data that are spread out over large regions and which dimensions are on different scales of measurement. Such an algorithm was developed to implement a robotics application consisting in sorting and storing objects in an unsupervised way. The toy dataset used to validate such application consists of Lego bricks of different shapes and colors. The uncontrolled lighting conditions together with the use of RGB color features, respectively involve data...

  6. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm.

    Science.gov (United States)

    Bourobou, Serge Thomas Mickala; Yoo, Younghwan

    2015-05-21

    This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things) based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen's temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  7. User Activity Recognition in Smart Homes Using Pattern Clustering Applied to Temporal ANN Algorithm

    Directory of Open Access Journals (Sweden)

    Serge Thomas Mickala Bourobou

    2015-05-01

    Full Text Available This paper discusses the possibility of recognizing and predicting user activities in the IoT (Internet of Things based smart environment. The activity recognition is usually done through two steps: activity pattern clustering and activity type decision. Although many related works have been suggested, they had some limited performance because they focused only on one part between the two steps. This paper tries to find the best combination of a pattern clustering method and an activity decision algorithm among various existing works. For the first step, in order to classify so varied and complex user activities, we use a relevant and efficient unsupervised learning method called the K-pattern clustering algorithm. In the second step, the training of smart environment for recognizing and predicting user activities inside his/her personal space is done by utilizing the artificial neural network based on the Allen’s temporal relations. The experimental results show that our combined method provides the higher recognition accuracy for various activities, as compared with other data mining classification algorithms. Furthermore, it is more appropriate for a dynamic environment like an IoT based smart home.

  8. Segmentation of dermatoscopic images by frequency domain filtering and k-means clustering algorithms.

    Science.gov (United States)

    Rajab, Maher I

    2011-11-01

    Since the introduction of epiluminescence microscopy (ELM), image analysis tools have been extended to the field of dermatology, in an attempt to algorithmically reproduce clinical evaluation. Accurate image segmentation of skin lesions is one of the key steps for useful, early and non-invasive diagnosis of coetaneous melanomas. This paper proposes two image segmentation algorithms based on frequency domain processing and k-means clustering/fuzzy k-means clustering. The two methods are capable of segmenting and extracting the true border that reveals the global structure irregularity (indentations and protrusions), which may suggest excessive cell growth or regression of a melanoma. As a pre-processing step, Fourier low-pass filtering is applied to reduce the surrounding noise in a skin lesion image. A quantitative comparison of the techniques is enabled by the use of synthetic skin lesion images that model lesions covered with hair to which Gaussian noise is added. The proposed techniques are also compared with an established optimal-based thresholding skin-segmentation method. It is demonstrated that for lesions with a range of different border irregularity properties, the k-means clustering and fuzzy k-means clustering segmentation methods provide the best performance over a range of signal to noise ratios. The proposed segmentation techniques are also demonstrated to have similar performance when tested on real skin lesions representing high-resolution ELM images. This study suggests that the segmentation results obtained using a combination of low-pass frequency filtering and k-means or fuzzy k-means clustering are superior to the result that would be obtained by using k-means or fuzzy k-means clustering segmentation methods alone. © 2011 John Wiley & Sons A/S.

  9. Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

    Science.gov (United States)

    Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

    2016-04-01

    Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and

  10. A Spectrum Sensing Method Based on Signal Feature and Clustering Algorithm in Cognitive Wireless Multimedia Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yongwei Zhang

    2017-01-01

    Full Text Available In order to solve the problem of difficulty in determining the threshold in spectrum sensing technologies based on the random matrix theory, a spectrum sensing method based on clustering algorithm and signal feature is proposed for Cognitive Wireless Multimedia Sensor Networks. Firstly, the wireless communication signal features are obtained according to the sampling signal covariance matrix. Then, the clustering algorithm is used to classify and test the signal features. Different signal features and clustering algorithms are compared in this paper. The experimental results show that the proposed method has better sensing performance.

  11. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks.

    Science.gov (United States)

    Li, Min; Li, Dongyan; Tang, Yu; Wu, Fangxiang; Wang, Jianxin

    2017-08-31

    Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.

  12. Examination of Clustering in Eutectic Microstrcture

    Directory of Open Access Journals (Sweden)

    Bortnyik K.

    2017-06-01

    Full Text Available The eutectic microstructures are complex microstructures and a hard work to describe it with few numbers. The eutectics builds up eutectic cells. In the cells the phases are clustered. With the development of big databases the data mining also develops, and produces a lot of method to handling the large datasets, and earns information from the sets. One typical method is the clustering, which finds the groups in the datasets. In this article a partitioning and a hierarchical clustering is applied to eutectic structures to find the clusters. In the case of AlMn alloy the K-means algorithm work well, and find the eutectic cells. In the case of ductile cast iron the hierarchical clustering works better. With the combination of the partitioning and hierarchical clustering with the image transformation, an effective method is developed for clustering the objects in the microstructures.

  13. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering.

    Science.gov (United States)

    Luo, Junhai; Fu, Liang

    2017-06-09

    With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  14. A Smartphone Indoor Localization Algorithm Based on WLAN Location Fingerprinting with Feature Extraction and Clustering

    Directory of Open Access Journals (Sweden)

    Junhai Luo

    2017-06-01

    Full Text Available With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS, which is collected from Access Points (APs. The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.

  15. An image segmentation method based on fuzzy C-means clustering and Cuckoo search algorithm

    Science.gov (United States)

    Wang, Mingwei; Wan, Youchuan; Gao, Xianjun; Ye, Zhiwei; Chen, Maolin

    2018-04-01

    Image segmentation is a significant step in image analysis and machine vision. Many approaches have been presented in this topic; among them, fuzzy C-means (FCM) clustering is one of the most widely used methods for its high efficiency and ambiguity of images. However, the success of FCM could not be guaranteed because it easily traps into local optimal solution. Cuckoo search (CS) is a novel evolutionary algorithm, which has been tested on some optimization problems and proved to be high-efficiency. Therefore, a new segmentation technique using FCM and blending of CS algorithm is put forward in the paper. Further, the proposed method has been measured on several images and compared with other existing FCM techniques such as genetic algorithm (GA) based FCM and particle swarm optimization (PSO) based FCM in terms of fitness value. Experimental results indicate that the proposed method is robust, adaptive and exhibits the better performance than other methods involved in the paper.

  16. Hybrid employment recommendation algorithm based on Spark

    Science.gov (United States)

    Li, Zuoquan; Lin, Yubei; Zhang, Xingming

    2017-08-01

    Aiming at the real-time application of collaborative filtering employment recommendation algorithm (CF), a clustering collaborative filtering recommendation algorithm (CCF) is developed, which applies hierarchical clustering to CF and narrows the query range of neighbour items. In addition, to solve the cold-start problem of content-based recommendation algorithm (CB), a content-based algorithm with users’ information (CBUI) is introduced for job recommendation. Furthermore, a hybrid recommendation algorithm (HRA) which combines CCF and CBUI algorithms is proposed, and implemented on Spark platform. The experimental results show that HRA can overcome the problems of cold start and data sparsity, and achieve good recommendation accuracy and scalability for employment recommendation.

  17. Comparison of multianalyte proficiency test results by sum of ranking differences, principal component analysis, and hierarchical cluster analysis.

    Science.gov (United States)

    Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša

    2013-10-01

    Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can

  18. A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

    Directory of Open Access Journals (Sweden)

    Sergio Briguglio

    2003-01-01

    Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

  19. Clustering of tethered satellite system simulation data by an adaptive neuro-fuzzy algorithm

    Science.gov (United States)

    Mitra, Sunanda; Pemmaraju, Surya

    1992-01-01

    Recent developments in neuro-fuzzy systems indicate that the concepts of adaptive pattern recognition, when used to identify appropriate control actions corresponding to clusters of patterns representing system states in dynamic nonlinear control systems, may result in innovative designs. A modular, unsupervised neural network architecture, in which fuzzy learning rules have been embedded is used for on-line identification of similar states. The architecture and control rules involved in Adaptive Fuzzy Leader Clustering (AFLC) allow this system to be incorporated in control systems for identification of system states corresponding to specific control actions. We have used this algorithm to cluster the simulation data of Tethered Satellite System (TSS) to estimate the range of delta voltages necessary to maintain the desired length rate of the tether. The AFLC algorithm is capable of on-line estimation of the appropriate control voltages from the corresponding length error and length rate error without a priori knowledge of their membership functions and familarity with the behavior of the Tethered Satellite System.

  20. An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2015-01-01

    Full Text Available Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs. Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K-Means (Par3PKM algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K-Means and then employ a MapReduce paradigm to redesign the optimized K-Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K-Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

  1. A Network-Based Algorithm for Clustering Multivariate Repeated Measures Data

    Science.gov (United States)

    Koslovsky, Matthew; Arellano, John; Schaefer, Caroline; Feiveson, Alan; Young, Millennia; Lee, Stuart

    2017-01-01

    The National Aeronautics and Space Administration (NASA) Astronaut Corps is a unique occupational cohort for which vast amounts of measures data have been collected repeatedly in research or operational studies pre-, in-, and post-flight, as well as during multiple clinical care visits. In exploratory analyses aimed at generating hypotheses regarding physiological changes associated with spaceflight exposure, such as impaired vision, it is of interest to identify anomalies and trends across these expansive datasets. Multivariate clustering algorithms for repeated measures data may help parse the data to identify homogeneous groups of astronauts that have higher risks for a particular physiological change. However, available clustering methods may not be able to accommodate the complex data structures found in NASA data, since the methods often rely on strict model assumptions, require equally-spaced and balanced assessment times, cannot accommodate missing data or differing time scales across variables, and cannot process continuous and discrete data simultaneously. To fill this gap, we propose a network-based, multivariate clustering algorithm for repeated measures data that can be tailored to fit various research settings. Using simulated data, we demonstrate how our method can be used to identify patterns in complex data structures found in practice.

  2. Clustering and Genetic Algorithm Based Hybrid Flowshop Scheduling with Multiple Operations

    Directory of Open Access Journals (Sweden)

    Yingfeng Zhang

    2014-01-01

    Full Text Available This research is motivated by a flowshop scheduling problem of our collaborative manufacturing company for aeronautic products. The heat-treatment stage (HTS and precision forging stage (PFS of the case are selected as a two-stage hybrid flowshop system. In HTS, there are four parallel machines and each machine can process a batch of jobs simultaneously. In PFS, there are two machines. Each machine can install any module of the four modules for processing the workpeices with different sizes. The problem is characterized by many constraints, such as batching operation, blocking environment, and setup time and working time limitations of modules, and so forth. In order to deal with the above special characteristics, the clustering and genetic algorithm is used to calculate the good solution for the two-stage hybrid flowshop problem. The clustering is used to group the jobs according to the processing ranges of the different modules of PFS. The genetic algorithm is used to schedule the optimal sequence of the grouped jobs for the HTS and PFS. Finally, a case study is used to demonstrate the efficiency and effectiveness of the designed genetic algorithm.

  3. Clustering Educational Digital Library Usage Data: A Comparison of Latent Class Analysis and K-Means Algorithms

    Science.gov (United States)

    Xu, Beijie; Recker, Mimi; Qi, Xiaojun; Flann, Nicholas; Ye, Lei

    2013-01-01

    This article examines clustering as an educational data mining method. In particular, two clustering algorithms, the widely used K-means and the model-based Latent Class Analysis, are compared, using usage data from an educational digital library service, the Instructional Architect (IA.usu.edu). Using a multi-faceted approach and multiple data…

  4. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  5. Development of a Genetic Algorithm to Automate Clustering of a Dependency Structure Matrix

    Science.gov (United States)

    Rogers, James L.; Korte, John J.; Bilardo, Vincent J.

    2006-01-01

    Much technology assessment and organization design data exists in Microsoft Excel spreadsheets. Tools are needed to put this data into a form that can be used by design managers to make design decisions. One need is to cluster data that is highly coupled. Tools such as the Dependency Structure Matrix (DSM) and a Genetic Algorithm (GA) can be of great benefit. However, no tool currently combines the DSM and a GA to solve the clustering problem. This paper describes a new software tool that interfaces a GA written as an Excel macro with a DSM in spreadsheet format. The results of several test cases are included to demonstrate how well this new tool works.

  6. Implementation of the ALICE HLT hardware cluster finder algorithm in Vivado HLS

    Energy Technology Data Exchange (ETDEWEB)

    Gruell, Frederik; Engel, Heiko; Kebschull, Udo [Infrastructure and Computer Systems in Data Processing, Goethe University Frankfurt (Germany); Collaboration: ALICE-Collaboration

    2016-07-01

    The FastClusterFinder algorithm running in the ALICE High-Level Trigger (HLT) read-out boards extracts clusters from raw data from the Time Projection Chamber (TPC) detector and forwards them to the HLT data processing framework for tracking, event reconstruction and compression. It serves as an early stage of feature extraction in the FPGA of the board. Past and current implementations are written in VHDL on reconfigurable hardware for high throughput and low latency. We examine Vivado HLS, a high-level language that promises an increased developer productivity, as an alternative. The implementation of the application is compared to descriptions in VHDL and MaxJ in terms of productivity, resource usage and maximum clock frequency.

  7. Implementation of Automatic Clustering Algorithm and Fuzzy Time Series in Motorcycle Sales Forecasting

    Science.gov (United States)

    Rasim; Junaeti, E.; Wirantika, R.

    2018-01-01

    Accurate forecasting for the sale of a product depends on the forecasting method used. The purpose of this research is to build motorcycle sales forecasting application using Fuzzy Time Series method combined with interval determination using automatic clustering algorithm. Forecasting is done using the sales data of motorcycle sales in the last ten years. Then the error rate of forecasting is measured using Means Percentage Error (MPE) and Means Absolute Percentage Error (MAPE). The results of forecasting in the one-year period obtained in this study are included in good accuracy.

  8. Clustering Multiple Sclerosis Subgroups with Multifractal Methods and Self-Organizing Map Algorithm

    Science.gov (United States)

    Karaca, Yeliz; Cattani, Carlo

    Magnetic resonance imaging (MRI) is the most sensitive method to detect chronic nervous system diseases such as multiple sclerosis (MS). In this paper, Brownian motion Hölder regularity functions (polynomial, periodic (sine), exponential) for 2D image, such as multifractal methods were applied to MR brain images, aiming to easily identify distressed regions, in MS patients. With these regions, we have proposed an MS classification based on the multifractal method by using the Self-Organizing Map (SOM) algorithm. Thus, we obtained a cluster analysis by identifying pixels from distressed regions in MR images through multifractal methods and by diagnosing subgroups of MS patients through artificial neural networks.

  9. Mobility-Aware and Load Balancing Based Clustering Algorithm for Energy Conservation in MANET

    Institute of Scientific and Technical Information of China (English)

    XU Li; ZHENG Bao-yu; GUO Gong-de

    2005-01-01

    Mobile ad hoc network (MANET) is one of wireless communication network architecture that has received a lot of attention. MANET is characterized by dynamic network topology and limited energy. With mobility-aware and load balancing based clustering algorithm (MLCA), this paper proposes a new topology management strategy to conserve energy. Performance simulation results show that the proposed MLCA strategy can balances the traffic load inside the whole network, so as to prolong the network lifetime, meanly, at the same time, achieve higher throughput ratio and network stability.

  10. Parallel implementation of D-Phylo algorithm for maximum likelihood clusters.

    Science.gov (United States)

    Malik, Shamita; Sharma, Dolly; Khatri, Sunil Kumar

    2017-03-01

    This study explains a newly developed parallel algorithm for phylogenetic analysis of DNA sequences. The newly designed D-Phylo is a more advanced algorithm for phylogenetic analysis using maximum likelihood approach. The D-Phylo while misusing the seeking capacity of k -means keeps away from its real constraint of getting stuck at privately conserved motifs. The authors have tested the behaviour of D-Phylo on Amazon Linux Amazon Machine Image(Hardware Virtual Machine)i2.4xlarge, six central processing unit, 122 GiB memory, 8  ×  800 Solid-state drive Elastic Block Store volume, high network performance up to 15 processors for several real-life datasets. Distributing the clusters evenly on all the processors provides us the capacity to accomplish a near direct speed if there should arise an occurrence of huge number of processors.

  11. [A cloud detection algorithm for MODIS images combining Kmeans clustering and multi-spectral threshold method].

    Science.gov (United States)

    Wang, Wei; Song, Wei-Guo; Liu, Shi-Xing; Zhang, Yong-Ming; Zheng, Hong-Yang; Tian, Wei

    2011-04-01

    An improved method for detecting cloud combining Kmeans clustering and the multi-spectral threshold approach is described. On the basis of landmark spectrum analysis, MODIS data is categorized into two major types initially by Kmeans method. The first class includes clouds, smoke and snow, and the second class includes vegetation, water and land. Then a multi-spectral threshold detection is applied to eliminate interference such as smoke and snow for the first class. The method is tested with MODIS data at different time under different underlying surface conditions. By visual method to test the performance of the algorithm, it was found that the algorithm can effectively detect smaller area of cloud pixels and exclude the interference of underlying surface, which provides a good foundation for the next fire detection approach.

  12. Genetic algorithm with fuzzy clustering for optimization of nuclear reactor problems

    International Nuclear Information System (INIS)

    Machado, Marcelo Dornellas; Sacco, Wagner Figueiredo; Schirru, Roberto

    2000-01-01

    Genetic Algorithms (GAs) are biologically motivated adaptive systems which have been used, with good results, in function optimization. However, traditional GAs rapidly push an artificial population toward convergence. That is, all individuals in the population soon become nearly identical. Niching Methods allow genetic algorithms to maintain a population of diverse individuals. GAs that incorporate these methods are capable of locating multiple, optimal solutions within a single population. The purpose of this study is to introduce a new niching technique based on the fuzzy clustering method FCM, bearing in mind its eventual application in nuclear reactor related problems, specially the nuclear reactor core reload one, which has multiple solutions. tests are performed using widely known test functions and their results show that the new method is quite promising, specially to a future application in real world problems like the nuclear reactor core reload. (author)

  13. Performance Analysis of Combined Methods of Genetic Algorithm and K-Means Clustering in Determining the Value of Centroid

    Science.gov (United States)

    Adya Zizwan, Putra; Zarlis, Muhammad; Budhiarti Nababan, Erna

    2017-12-01

    The determination of Centroid on K-Means Algorithm directly affects the quality of the clustering results. Determination of centroid by using random numbers has many weaknesses. The GenClust algorithm that combines the use of Genetic Algorithms and K-Means uses a genetic algorithm to determine the centroid of each cluster. The use of the GenClust algorithm uses 50% chromosomes obtained through deterministic calculations and 50% is obtained from the generation of random numbers. This study will modify the use of the GenClust algorithm in which the chromosomes used are 100% obtained through deterministic calculations. The results of this study resulted in performance comparisons expressed in Mean Square Error influenced by centroid determination on K-Means method by using GenClust method, modified GenClust method and also classic K-Means.

  14. Automated segmentation of white matter fiber bundles using diffusion tensor imaging data and a new density based clustering algorithm.

    Science.gov (United States)

    Kamali, Tahereh; Stashuk, Daniel

    2016-10-01

    Robust and accurate segmentation of brain white matter (WM) fiber bundles assists in diagnosing and assessing progression or remission of neuropsychiatric diseases such as schizophrenia, autism and depression. Supervised segmentation methods are infeasible in most applications since generating gold standards is too costly. Hence, there is a growing interest in designing unsupervised methods. However, most conventional unsupervised methods require the number of clusters be known in advance which is not possible in most applications. The purpose of this study is to design an unsupervised segmentation algorithm for brain white matter fiber bundles which can automatically segment fiber bundles using intrinsic diffusion tensor imaging data information without considering any prior information or assumption about data distributions. Here, a new density based clustering algorithm called neighborhood distance entropy consistency (NDEC), is proposed which discovers natural clusters within data by simultaneously utilizing both local and global density information. The performance of NDEC is compared with other state of the art clustering algorithms including chameleon, spectral clustering, DBSCAN and k-means using Johns Hopkins University publicly available diffusion tensor imaging data. The performance of NDEC and other employed clustering algorithms were evaluated using dice ratio as an external evaluation criteria and density based clustering validation (DBCV) index as an internal evaluation metric. Across all employed clustering algorithms, NDEC obtained the highest average dice ratio (0.94) and DBCV value (0.71). NDEC can find clusters with arbitrary shapes and densities and consequently can be used for WM fiber bundle segmentation where there is no distinct boundary between various bundles. NDEC may also be used as an effective tool in other pattern recognition and medical diagnostic systems in which discovering natural clusters within data is a necessity. Copyright

  15. Automated spike sorting algorithm based on Laplacian eigenmaps and k-means clustering.

    Science.gov (United States)

    Chah, E; Hok, V; Della-Chiesa, A; Miller, J J H; O'Mara, S M; Reilly, R B

    2011-02-01

    This study presents a new automatic spike sorting method based on feature extraction by Laplacian eigenmaps combined with k-means clustering. The performance of the proposed method was compared against previously reported algorithms such as principal component analysis (PCA) and amplitude-based feature extraction. Two types of classifier (namely k-means and classification expectation-maximization) were incorporated within the spike sorting algorithms, in order to find a suitable classifier for the feature sets. Simulated data sets and in-vivo tetrode multichannel recordings were employed to assess the performance of the spike sorting algorithms. The results show that the proposed algorithm yields significantly improved performance with mean sorting accuracy of 73% and sorting error of 10% compared to PCA which combined with k-means had a sorting accuracy of 58% and sorting error of 10%.A correction was made to this article on 22 February 2011. The spacing of the title was amended on the abstract page. No changes were made to the article PDF and the print version was unaffected.

  16. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks.

    Science.gov (United States)

    Mustapha, Ibrahim; Mohd Ali, Borhanuddin; Rasid, Mohd Fadlee A; Sali, Aduwati; Mohamad, Hafizal

    2015-08-13

    It is well-known that clustering partitions network into logical groups of nodes in order to achieve energy efficiency and to enhance dynamic channel access in cognitive radio through cooperative sensing. While the topic of energy efficiency has been well investigated in conventional wireless sensor networks, the latter has not been extensively explored. In this paper, we propose a reinforcement learning-based spectrum-aware clustering algorithm that allows a member node to learn the energy and cooperative sensing costs for neighboring clusters to achieve an optimal solution. Each member node selects an optimal cluster that satisfies pairwise constraints, minimizes network energy consumption and enhances channel sensing performance through an exploration technique. We first model the network energy consumption and then determine the optimal number of clusters for the network. The problem of selecting an optimal cluster is formulated as a Markov Decision Process (MDP) in the algorithm and the obtained simulation results show convergence, learning and adaptability of the algorithm to dynamic environment towards achieving an optimal solution. Performance comparisons of our algorithm with the Groupwise Spectrum Aware (GWSA)-based algorithm in terms of Sum of Square Error (SSE), complexity, network energy consumption and probability of detection indicate improved performance from the proposed approach. The results further reveal that an energy savings of 9% and a significant Primary User (PU) detection improvement can be achieved with the proposed approach.

  17. A Novel Energy-Aware Distributed Clustering Algorithm for Heterogeneous Wireless Sensor Networks in the Mobile Environment.

    Science.gov (United States)

    Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong

    2015-12-10

    In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network's running and the degree of candidate nodes' effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime.

  18. Optimizing Energy Consumption in Vehicular Sensor Networks by Clustering Using Fuzzy C-Means and Fuzzy Subtractive Algorithms

    Science.gov (United States)

    Ebrahimi, A.; Pahlavani, P.; Masoumi, Z.

    2017-09-01

    Traffic monitoring and managing in urban intelligent transportation systems (ITS) can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC) for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs); moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH), and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM) and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.

  19. OPTIMIZING ENERGY CONSUMPTION IN VEHICULAR SENSOR NETWORKS BY CLUSTERING USING FUZZY C-MEANS AND FUZZY SUBTRACTIVE ALGORITHMS

    Directory of Open Access Journals (Sweden)

    A. Ebrahimi

    2017-09-01

    Full Text Available Traffic monitoring and managing in urban intelligent transportation systems (ITS can be carried out based on vehicular sensor networks. In a vehicular sensor network, vehicles equipped with sensors such as GPS, can act as mobile sensors for sensing the urban traffic and sending the reports to a traffic monitoring center (TMC for traffic estimation. The energy consumption by the sensor nodes is a main problem in the wireless sensor networks (WSNs; moreover, it is the most important feature in designing these networks. Clustering the sensor nodes is considered as an effective solution to reduce the energy consumption of WSNs. Each cluster should have a Cluster Head (CH, and a number of nodes located within its supervision area. The cluster heads are responsible for gathering and aggregating the information of clusters. Then, it transmits the information to the data collection center. Hence, the use of clustering decreases the volume of transmitting information, and, consequently, reduces the energy consumption of network. In this paper, Fuzzy C-Means (FCM and Fuzzy Subtractive algorithms are employed to cluster sensors and investigate their performance on the energy consumption of sensors. It can be seen that the FCM algorithm and Fuzzy Subtractive have been reduced energy consumption of vehicle sensors up to 90.68% and 92.18%, respectively. Comparing the performance of the algorithms implies the 1.5 percent improvement in Fuzzy Subtractive algorithm in comparison.

  20. A DISTRIBUTED ENERGY EFFICIENT CLUSTERING ALGORITHM FOR DATA AGGREGATION IN WIRELESS SENSOR NETWORKS

    Directory of Open Access Journals (Sweden)

    Seyed Mohammad Bagher Musavi Shirazi

    2018-06-01

    Full Text Available Wireless sensor networks (WSNs are a new generation of networks typically consisting of a large number of inexpensive nodes with wireless communications. The main purpose of these networks is to collect information from the environment for further processing. Nodes in the network have been equipped with limited battery lifetime, so energy saving is one of the major issues in WSNs. If we balance the load among cluster heads and prevent having an extra load on just a few nodes in the network, we can reach longer network lifetime. One solution to control energy consumption and balance the load among nodes is to use clustering techniques. In this paper, we propose a new distributed energy-efficient clustering algorithm for data aggregation in wireless sensor networks, called Distributed Clustering for Data Aggregation (DCDA. In our new approach, an optimal transmission tree is constructed among sensor nodes with a new greedy method. Base station (BS is the root, cluster heads (CHs and relay nodes are intermediate nodes, and other nodes (cluster member nodes are the leaves of this transmission tree. DCDA balances load among CHs in intra-cluster and inter-cluster data communications using different cluster sizes. For efficient inter-cluster communications, some relay nodes will transfer data between CHs. Energy consumption, distance to the base station, and cluster heads’ centric metric are three main adjustment parameters for the cluster heads election. Simulation results show that the proposed protocol leads to the reduction of individual sensor nodes’ energy consumption and prolongs network lifetime, in comparison with other known methods. ABSTRAK: Rangkaian sensor wayarles (WSN adalah rangkaian generasi baru yang terdiri daripada nod-nod murah komunikasi wayarles. Tujuan rangkaian-rangkaian ini adalah bagi mengumpul maklumat sekeliling untuk proses seterusnya. Nod dalam rangkaian ini dilengkapi bateri kurang jangka hayat, jadi simpanan tenaga

  1. A new application of hierarchical cluster analysis to investigate organic peaks in bulk mass spectra obtained with an Aerodyne Aerosol Mass Spectrometer

    Science.gov (United States)

    Middlebrook, A. M.; Marcolli, C.; Canagaratna, M. R.; Worsnop, D. R.; Bahreini, R.; de Gouw, J. A.; Warneke, C.; Goldan, P. D.; Kuster, W. C.; Williams, E. J.; Lerner, B. M.; Roberts, J. M.; Meagher, J. F.; Fehsenfeld, F. C.; Marchewka, M. L.; Bertman, S. B.

    2006-12-01

    We applied hierarchical cluster analysis to an Aerodyne aerosol mass spectrometer (AMS) bulk mass spectral dataset collected aboard the NOAA research vessel Ronald H. Brown during the 2002 New England Air Quality Study off the east coast of the United States. Emphasizing the organic peaks, the cluster analysis yielded a series of categories that are distinguishable with respect to their mass spectra and their occurrence as a function of time. The differences between the categories mainly arise from relative intensity changes rather than from the presence or absence of specific peaks. The most frequent category exhibits a strong signal at m/z 44 and represents oxidized organic matter probably originating from both anthropogenic as well as biogenic sources. On the basis of spectral and trace gas correlations, the second most common category with strong signals at m/z 29, 43, and 44 contains contributions from isoprene oxidation products. The third through the fifth most common categories have peak patterns characteristic of monoterpene oxidation products and were most frequently observed when air masses from monoterpene rich regions were sampled. Taken together, the second through the fifth most common categories represent on average 17% of the total organic mass that stems likely from biogenic sources during the ship's cruise. These numbers have to be viewed as lower limits since the most common category was attributed to anthropogenic sources for this calculation. The cluster analysis was also very effective in identifying a few contaminated mass spectra that were not removed during pre-processing. This study demonstrates that hierarchical clustering is a useful tool to analyze the complex patterns of the organic peaks in bulk aerosol mass spectra from a field study.

  2. KANTS: a stigmergic ant algorithm for cluster analysis and swarm art.

    Science.gov (United States)

    Fernandes, Carlos M; Mora, Antonio M; Merelo, Juan J; Rosa, Agostinho C

    2014-06-01

    KANTS is a swarm intelligence clustering algorithm inspired by the behavior of social insects. It uses stigmergy as a strategy for clustering large datasets and, as a result, displays a typical behavior of complex systems: self-organization and global patterns emerging from the local interaction of simple units. This paper introduces a simplified version of KANTS and describes recent experiments with the algorithm in the context of a contemporary artistic and scientific trend called swarm art, a type of generative art in which swarm intelligence systems are used to create artwork or ornamental objects. KANTS is used here for generating color drawings from the input data that represent real-world phenomena, such as electroencephalogram sleep data. However, the main proposal of this paper is an art project based on well-known abstract paintings, from which the chromatic values are extracted and used as input. Colors and shapes are therefore reorganized by KANTS, which generates its own interpretation of the original artworks. The project won the 2012 Evolutionary Art, Design, and Creativity Competition.

  3. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  4. A Hybrid Spectral Clustering and Deep Neural Network Ensemble Algorithm for Intrusion Detection in Sensor Networks.

    Science.gov (United States)

    Ma, Tao; Wang, Fen; Cheng, Jianjun; Yu, Yang; Chen, Xiaoyun

    2016-10-13

    The development of intrusion detection systems (IDS) that are adapted to allow routers and network defence systems to detect malicious network traffic disguised as network protocols or normal access is a critical challenge. This paper proposes a novel approach called SCDNN, which combines spectral clustering (SC) and deep neural network (DNN) algorithms. First, the dataset is divided into k subsets based on sample similarity using cluster centres, as in SC. Next, the distance between data points in a testing set and the training set is measured based on similarity features and is fed into the deep neural network algorithm for intrusion detection. Six KDD-Cup99 and NSL-KDD datasets and a sensor network dataset were employed to test the performance of the model. These experimental results indicate that the SCDNN classifier not only performs better than backpropagation neural network (BPNN), support vector machine (SVM), random forest (RF) and Bayes tree models in detection accuracy and the types of abnormal attacks found. It also provides an effective tool of study and analysis of intrusion detection in large networks.

  5. Are judgments a form of data clustering? Reexamining contrast effects with the k-means algorithm.

    Science.gov (United States)

    Boillaud, Eric; Molina, Guylaine

    2015-04-01

    A number of theories have been proposed to explain in precise mathematical terms how statistical parameters and sequential properties of stimulus distributions affect category ratings. Various contextual factors such as the mean, the midrange, and the median of the stimuli; the stimulus range; the percentile rank of each stimulus; and the order of appearance have been assumed to influence judgmental contrast. A data clustering reinterpretation of judgmental relativity is offered wherein the influence of the initial choice of centroids on judgmental contrast involves 2 combined frequency and consistency tendencies. Accounts of the k-means algorithm are provided, showing good agreement with effects observed on multiple distribution shapes and with a variety of interaction effects relating to the number of stimuli, the number of response categories, and the method of skewing. Experiment 1 demonstrates that centroid initialization accounts for contrast effects obtained with stretched distributions. Experiment 2 demonstrates that the iterative convergence inherent to the k-means algorithm accounts for the contrast reduction observed across repeated blocks of trials. The concept of within-cluster variance minimization is discussed, as is the applicability of a backward k-means calculation method for inferring, from empirical data, the values of the centroids that would serve as a representation of the judgmental context. (c) 2015 APA, all rights reserved.

  6. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

    Science.gov (United States)

    Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.

  7. "Analyzing the Longitudinal K-12 Grading Histories of Entire Cohorts of Students: Grades, Data Driven Decision Making, Dropping out and Hierarchical Cluster Analysis"

    Directory of Open Access Journals (Sweden)

    Alex J. Bowers

    2010-05-01

    Full Text Available School personnel currently lack an effective method to pattern and visually interpret disaggregated achievement data collected on students as a means to help inform decision making. This study, through the examination of longitudinal K-12 teacher assigned grading histories for entire cohorts of students from a school district (n=188, demonstrates a novel application of hierarchical cluster analysis and pattern visualization in which all data points collected on every student in a cohort can be patterned, visualized and interpreted to aid in data driven decision making by teachers and administrators. Additionally, as a proof-of-concept study, overall schooling outcomes, such as student dropout or taking a college entrance exam, are identified from the data patterns and compared to past methods of dropout identification as one example of the usefulness of the method. Hierarchical cluster analysis correctly identified over 80% of the students who dropped out using the entire student grade history patterns from either K-12 or K-8.

  8. Chemical Fingerprint and Quantitative Analysis for the Quality Evaluation of Platycladi cacumen by Ultra-performance Liquid Chromatography Coupled with Hierarchical Cluster Analysis.

    Science.gov (United States)

    Shan, Mingqiu; Li, Sam Fong Yau; Yu, Sheng; Qian, Yan; Guo, Shuchen; Zhang, Li; Ding, Anwei

    2018-01-01

    Platycladi cacumen (dried twigs and leaves of Platycladus orientalis (L.) Franco) is a frequently utilized Chinese medicinal herb. To evaluate the quality of the phytomedcine, an ultra-performance liquid chromatographic method with diode array detection was established for chemical fingerprinting and quantitative analysis. In this study, 27 batches of P. cacumen from different regions were collected for analysis. A chemical fingerprint with 20 common peaks was obtained using Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine (Version 2004A). Among these 20 components, seven flavonoids (myricitrin, isoquercitrin, quercitrin, afzelin, cupressuflavone, amentoflavone and hinokiflavone) were identified and determined simultaneously. In the method validation, the seven analytes showed good regressions (R ≥ 0.9995) within linear ranges and good recoveries from 96.4% to 103.3%. Furthermore, with the contents of these seven flavonoids, hierarchical clustering analysis was applied to distinguish the 27 batches into five groups. The chemometric results showed that these groups were almost consistent with geographical positions and climatic conditions of the production regions. Integrating fingerprint analysis, simultaneous determination and hierarchical clustering analysis, the established method is rapid, sensitive, accurate and readily applicable, and also provides a significant foundation for quality control of P. cacumen efficiently. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Lord-Wingersky Algorithm Version 2.0 for Hierarchical Item Factor Models with Applications in Test Scoring, Scale Alignment, and Model Fit Testing.

    Science.gov (United States)

    Cai, Li

    2015-06-01

    Lord and Wingersky's (Appl Psychol Meas 8:453-461, 1984) recursive algorithm for creating summed score based likelihoods and posteriors has a proven track record in unidimensional item response theory (IRT) applications. Extending the recursive algorithm to handle multidimensionality is relatively simple, especially with fixed quadrature because the recursions can be defined on a grid formed by direct products of quadrature points. However, the increase in computational burden remains exponential in the number of dimensions, making the implementation of the recursive algorithm cumbersome for truly high-dimensional models. In this paper, a dimension reduction method that is specific to the Lord-Wingersky recursions is developed. This method can take advantage of the restrictions implied by hierarchical item factor models, e.g., the bifactor model, the testlet model, or the two-tier model, such that a version of the Lord-Wingersky recursive algorithm can operate on a dramatically reduced set of quadrature points. For instance, in a bifactor model, the dimension of integration is always equal to 2, regardless of the number of factors. The new algorithm not only provides an effective mechanism to produce summed score to IRT scaled score translation tables properly adjusted for residual dependence, but leads to new applications in test scoring, linking, and model fit checking as well. Simulated and empirical examples are used to illustrate the new applications.

  10. 2D evaluation of spectral LIBS data derived from heterogeneous materials using cluster algorithm

    Science.gov (United States)

    Gottlieb, C.; Millar, S.; Grothe, S.; Wilsch, G.

    2017-08-01

    Laser-induced Breakdown Spectroscopy (LIBS) is capable of providing spatially resolved element maps in regard to the chemical composition of the sample. The evaluation of heterogeneous materials is often a challenging task, especially in the case of phase boundaries. In order to determine information about a certain phase of a material, the need for a method that offers an objective evaluation is necessary. This paper will introduce a cluster algorithm in the case of heterogeneous building materials (concrete) to separate the spectral information of non-relevant aggregates and cement matrix. In civil engineering, the information about the quantitative ingress of harmful species like Cl-, Na+ and SO42- is of great interest in the evaluation of the remaining lifetime of structures (Millar et al., 2015; Wilsch et al., 2005). These species trigger different damage processes such as the alkali-silica reaction (ASR) or the chloride-induced corrosion of the reinforcement. Therefore, a discrimination between the different phases, mainly cement matrix and aggregates, is highly important (Weritz et al., 2006). For the 2D evaluation, the expectation-maximization-algorithm (EM algorithm; Ester and Sander, 2000) has been tested for the application presented in this work. The method has been introduced and different figures of merit have been presented according to recommendations given in Haddad et al. (2014). Advantages of this method will be highlighted. After phase separation, non-relevant information can be excluded and only the wanted phase displayed. Using a set of samples with known and unknown composition, the EM-clustering method has been validated regarding to Gustavo González and Ángeles Herrador (2007).

  11. Developing the fuzzy c-means clustering algorithm based on maximum entropy for multitarget tracking in a cluttered environment

    Science.gov (United States)

    Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing

    2018-01-01

    For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.

  12. Enhancements of LEACH Algorithm for Wireless Networks: A Review

    Directory of Open Access Journals (Sweden)

    M. Madheswaran

    2013-12-01

    Full Text Available Low Energy Adaptive Clustering Hierarchy (LEACH protocol is the first hierarchical cluster based routing protocol successfully used in the Wireless Sensor Networks (WSN. In this paper, various enhancements used in the original LEACH protocol are examined. The basic operations, advantages and limitations of the modified LEACH algorithms are compared to identify the research issues to be solved and to give the suggestions for the future proposed routing algorithms of wireless networks based on LEACH routing algorithm.

  13. Ultrathin mesoporous Co_3O_4 nanosheets-constructed hierarchical clusters as high rate capability and long life anode materials for lithium-ion batteries

    International Nuclear Information System (INIS)

    Wu, Shengming; Xia, Tian; Wang, Jingping; Lu, Feifei; Xu, Chunbo; Zhang, Xianfa; Huo, Lihua; Zhao, Hui

    2017-01-01

    Graphical abstract: Ultrathin mesoporous Co_3O_4 nanosheets-constructed hierarchical clusters (UMCN-HCs) have been successfully synthesized via a facile hydrothermal method followed by a subsequent thermolysis treatment. When tested as anode materials for LIBs, UMCN-HCs achieve high reversible capacity, good long cycling life, and rate capability. - Highlights: • UMCN-HCs show high capacity, excellent stability, and good rate capability. • UMCN-HCs retain a capacity of 1067 mAh g"−"1 after 100 cycles at 100 mA g"−"1. • UMCN-HCs deliver a capacity of 507 mAh g"−"1 after 500 cycles at 2 A g"−"1. - Abstract: Herein, Ultrathin mesoporous Co_3O_4 nanosheets-constructed hierarchical clusters (UMCN-HCs) have been successfully synthesized via a facile hydrothermal method followed by a subsequent thermolysis treatment at 600 °C in air. The products consist of cluster-like Co_3O_4 microarchitectures, which are assembled by numerous ultrathin mesoporous Co_3O_4 nanosheets. When tested as anode materials for lithium-ion batteries, UMCN-HCs deliver a high reversible capacity of 1067 mAh g"−"1 at a current density of 100 mA g"−"1 after 100 cycles. Even at 2 A g"−"1, a stable capacity as high as 507 mAh g"−"1 can be achieved after 500 cycles. The high reversible capacity, excellent cycling stability, and good rate capability of UMCN-HCs may be attributed to their mesoporous sheet-like nanostructure. The sheet-layered structure of UMCN-HCs may buffer the volume change during the lithiation-delithiation process, and the mesoporous characteristic make lithium-ion transfer more easily at the interface between the active electrode and the electrolyte.

  14. Informational and linguistic analysis of large genomic sequence collections via efficient Hadoop cluster algorithms.

    Science.gov (United States)

    Ferraro Petrillo, Umberto; Roscigno, Gianluca; Cattaneo, Giuseppe; Giancarlo, Raffaele

    2018-06-01

    Information theoretic and compositional/linguistic analysis of genomes have a central role in bioinformatics, even more so since the associated methodologies are becoming very valuable also for epigenomic and meta-genomic studies. The kernel of those methods is based on the collection of k-mer statistics, i.e. how many times each k-mer in {A,C,G,T}k occurs in a DNA sequence. Although this problem is computationally very simple and efficiently solvable on a conventional computer, the sheer amount of data available now in applications demands to resort to parallel and distributed computing. Indeed, those type of algorithms have been developed to collect k-mer statistics in the realm of genome assembly. However, they are so specialized to this domain that they do not extend easily to the computation of informational and linguistic indices, concurrently on sets of genomes. Following the well-established approach in many disciplines, and with a growing success also in bioinformatics, to resort to MapReduce and Hadoop to deal with 'Big Data' problems, we present KCH, the first set of MapReduce algorithms able to perform concurrently informational and linguistic analysis of large collections of genomic sequences on a Hadoop cluster. The benchmarking of KCH that we provide indicates that it is quite effective and versatile. It is also competitive with respect to the parallel and distributed algorithms highly specialized to k-mer statistics collection for genome assembly problems. In conclusion, KCH is a much needed addition to the growing number of algorithms and tools that use MapReduce for bioinformatics core applications. The software, including instructions for running it over Amazon AWS, as well as the datasets are available at http://www.di-srv.unisa.it/KCH. umberto.ferraro@uniroma1.it. Supplementary data are available at Bioinformatics online.

  15. Semi-supervised spectral algorithms for community detection in complex networks based on equivalence of clustering methods

    Science.gov (United States)

    Ma, Xiaoke; Wang, Bingbo; Yu, Liang

    2018-01-01

    Community detection is fundamental for revealing the structure-functionality relationship in complex networks, which involves two issues-the quantitative function for community as well as algorithms to discover communities. Despite significant research on either of them, few attempt has been made to establish the connection between the two issues. To attack this problem, a generalized quantification function is proposed for community in weighted networks, which provides a framework that unifies several well-known measures. Then, we prove that the trace optimization of the proposed measure is equivalent with the objective functions of algorithms such as nonnegative matrix factorization, kernel K-means as well as spectral clustering. It serves as the theoretical foundation for designing algorithms for community detection. On the second issue, a semi-supervised spectral clustering algorithm is developed by exploring the equivalence relation via combining the nonnegative matrix factorization and spectral clustering. Different from the traditional semi-supervised algorithms, the partial supervision is integrated into the objective of the spectral algorithm. Finally, through extensive experiments on both artificial and real world networks, we demonstrate that the proposed method improves the accuracy of the traditional spectral algorithms in community detection.

  16. Evaluation of the application of BIM technology based on PCA - Q Clustering Algorithm and Choquet Integral

    Directory of Open Access Journals (Sweden)

    Wei Xiaozhao

    2016-03-01

    Full Text Available For the development of the construction industry, the construction of data era is approaching, BIM (building information model with the actual needs of the construction industry has been widely used as a building information clan system software, different software for the practical application of different maturity, through the expert scoring method for the application of BIM technology maturity index mark, establish the evaluation index system, using PCA - Q clustering algorithm for the evaluation index system of classification, comprehensive evaluation in combination with the Choquet integral on the classification of evaluation index system, to achieve a reasonable assessment of the application of BIM technology maturity index. To lay a foundation for the future development of BIM Technology in various fields of construction, at the same time provides direction for the comprehensive application of BIM technology.

  17. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

    Directory of Open Access Journals (Sweden)

    Amir Ahmad

    2016-01-01

    Full Text Available The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.

  18. Relationship between clustering and algorithmic phase transitions in the random k-XORSAT model and its NP-complete extensions

    International Nuclear Information System (INIS)

    Altarelli, F; Monasson, R; Zamponi, F

    2008-01-01

    We study the performances of stochastic heuristic search algorithms on Uniquely Extendible Constraint Satisfaction Problems with random inputs. We show that, for any heuristic preserving the Poissonian nature of the underlying instance, the (heuristic-dependent) largest ratio α a of constraints per variables for which a search algorithm is likely to find solutions is smaller than the critical ratio α d above which solutions are clustered and highly correlated. In addition we show that the clustering ratio can be reached when the number k of variables per constraints goes to infinity by the so-called Generalized Unit Clause heuristic

  19. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    Science.gov (United States)

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  20. a Novel 3d Intelligent Fuzzy Algorithm Based on Minkowski-Clustering

    Science.gov (United States)

    Toori, S.; Esmaeily, A.

    2017-09-01

    Assessing and monitoring the state of the earth surface is a key requirement for global change research. In this paper, we propose a new consensus fuzzy clustering algorithm that is based on the Minkowski distance. This research concentrates on Tehran's vegetation mass and its changes during 29 years using remote sensing technology. The main purpose of this research is to evaluate the changes in vegetation mass using a new process by combination of intelligent NDVI fuzzy clustering and Minkowski distance operation. The dataset includes the images of Landsat8 and Landsat TM, from 1989 to 2016. For each year three images of three continuous days were used to identify vegetation impact and recovery. The result was a 3D NDVI image, with one dimension for each day NDVI. The next step was the classification procedure which is a complicated process of categorizing pixels into a finite number of separate classes, based on their data values. If a pixel satisfies a certain set of standards, the pixel is allocated to the class that corresponds to those criteria. This method is less sensitive to noise and can integrate solutions from multiple samples of data or attributes for processing data in the processing industry. The result was a fuzzy one dimensional image. This image was also computed for the next 28 years. The classification was done in both specified urban and natural park areas of Tehran. Experiments showed that our method worked better in classifying image pixels in comparison with the standard classification methods.

  1. A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization.

    Science.gov (United States)

    He, Sheng; Samara, Petros; Burgers, Jan; Schomaker, Lambert

    2016-11-01

    It is of essential importance for historians to know the date and place of origin of the documents they study. It would be a huge advancement for historical scholars if it would be possible to automatically estimate the geographical and temporal provenance of a handwritten document by inferring them from the handwriting style of such a document. We propose a multiple-label guided clustering algorithm to discover the correlations between the concrete low-level visual elements in historical documents and abstract labels, such as date and location. First, a novel descriptor, called histogram of orientations of handwritten strokes, is proposed to extract and describe the visual elements, which is built on a scale-invariant polar-feature space. In addition, the multi-label self-organizing map (MLSOM) is proposed to discover the correlations between the low-level visual elements and their labels in a single framework. Our proposed MLSOM can be used to predict the labels directly. Moreover, the MLSOM can also be considered as a pre-structured clustering method to build a codebook, which contains more discriminative information on date and geography. The experimental results on the medieval paleographic scale data set demonstrate that our method achieves state-of-the-art results.

  2. Hybrid clustering based fuzzy structure for vibration control - Part 1: A novel algorithm for building neuro-fuzzy system

    Science.gov (United States)

    Nguyen, Sy Dzung; Nguyen, Quoc Hung; Choi, Seung-Bok

    2015-01-01

    This paper presents a new algorithm for building an adaptive neuro-fuzzy inference system (ANFIS) from a training data set called B-ANFIS. In order to increase accuracy of the model, the following issues are executed. Firstly, a data merging rule is proposed to build and perform a data-clustering strategy. Subsequently, a combination of clustering processes in the input data space and in the joint input-output data space is presented. Crucial reason of this task is to overcome problems related to initialization and contradictory fuzzy rules, which usually happen when building ANFIS. The clustering process in the input data space is accomplished based on a proposed merging-possibilistic clustering (MPC) algorithm. The effectiveness of this process is evaluated to resume a clustering process in the joint input-output data space. The optimal parameters obtained after completion of the clustering process are used to build ANFIS. Simulations based on a numerical data, 'Daily Data of Stock A', and measured data sets of a smart damper are performed to analyze and estimate accuracy. In addition, convergence and robustness of the proposed algorithm are investigated based on both theoretical and testing approaches.

  3. Utility of K-Means clustering algorithm in differentiating apparent diffusion coefficient values between benign and malignant neck pathologies

    Science.gov (United States)

    Srinivasan, A.; Galbán, C.J.; Johnson, T.D.; Chenevert, T.L.; Ross, B.D.; Mukherji, S.K.

    2014-01-01

    Purpose The objective of our study was to analyze the differences between apparent diffusion coefficient (ADC) partitions (created using the K-Means algorithm) between benign and malignant neck lesions and evaluate its benefit in distinguishing these entities. Material and methods MRI studies of 10 benign and 10 malignant proven neck pathologies were post-processed on a PC using in-house software developed in MATLAB (The MathWorks, Inc., Natick, MA). Lesions were manually contoured by two neuroradiologists with the ADC values within each lesion clustered into two (low ADC-ADCL, high ADC-ADCH) and three partitions (ADCL, intermediate ADC-ADCI, ADCH) using the K-Means clustering algorithm. An unpaired two-tailed Student’s t-test was performed for all metrics to determine statistical differences in the means between the benign and malignant pathologies. Results Statistically significant difference between the mean ADCL clusters in benign and malignant pathologies was seen in the 3 cluster models of both readers (p=0.03, 0.022 respectively) and the 2 cluster model of reader 2 (p=0.04) with the other metrics (ADCH, ADCI, whole lesion mean ADC) not revealing any significant differences. Receiver operating characteristics curves demonstrated the quantitative difference in mean ADCH and ADCL in both the 2 and 3 cluster models to be predictive of malignancy (2 clusters: p=0.008, area under curve=0.850, 3 clusters: p=0.01, area under curve=0.825). Conclusion The K-Means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared to whole lesion mean ADC alone. PMID:20007723

  4. Utility of the k-means clustering algorithm in differentiating apparent diffusion coefficient values of benign and malignant neck pathologies.

    Science.gov (United States)

    Srinivasan, A; Galbán, C J; Johnson, T D; Chenevert, T L; Ross, B D; Mukherji, S K

    2010-04-01

    Does the K-means algorithm do a better job of differentiating benign and malignant neck pathologies compared to only mean ADC? The objective of our study was to analyze the differences between ADC partitions to evaluate whether the K-means technique can be of additional benefit to whole-lesion mean ADC alone in distinguishing benign and malignant neck pathologies. MR imaging studies of 10 benign and 10 malignant proved neck pathologies were postprocessed on a PC by using in-house software developed in Matlab. Two neuroradiologists manually contoured the lesions, with the ADC values within each lesion clustered into 2 (low, ADC-ADC(L); high, ADC-ADC(H)) and 3 partitions (ADC(L); intermediate, ADC-ADC(I); ADC(H)) by using the K-means clustering algorithm. An unpaired 2-tailed Student t test was performed for all metrics to determine statistical differences in the means of the benign and malignant pathologies. A statistically significant difference between the mean ADC(L) clusters in benign and malignant pathologies was seen in the 3-cluster models of both readers (P = .03 and .022, respectively) and the 2-cluster model of reader 2 (P = .04), with the other metrics (ADC(H), ADC(I); whole-lesion mean ADC) not revealing any significant differences. ROC curves demonstrated the quantitative differences in mean ADC(H) and ADC(L) in both the 2- and 3-cluster models to be predictive of malignancy (2 clusters: P = .008, area under curve = 0.850; 3 clusters: P = .01, area under curve = 0.825). The K-means clustering algorithm that generates partitions of large datasets may provide a better characterization of neck pathologies and may be of additional benefit in distinguishing benign and malignant neck pathologies compared with whole-lesion mean ADC alone.

  5. Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

    Science.gov (United States)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

    2016-12-01

    The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  6. Open source clustering software.

    Science.gov (United States)

    de Hoon, M J L; Imoto, S; Nolan, J; Miyano, S

    2004-06-12

    We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.

  7. Quantum annealing for combinatorial clustering

    Science.gov (United States)

    Kumar, Vaibhaw; Bass, Gideon; Tomlin, Casey; Dulny, Joseph

    2018-02-01

    Clustering is a powerful machine learning technique that groups "similar" data points based on their characteristics. Many clustering algorithms work by approximating the minimization of an objective function, namely the sum of within-the-cluster distances between points. The straightforward approach involves examining all the possible assignments of points to each of the clusters. This approach guarantees the solution will be a global minimum; however, the number of possible assignments scales quickly with the number of data points and becomes computationally intractable even for very small datasets. In order to circumvent this issue, cost function minima are found using popular local search-based heuristic approaches such as k-means and hierarchical clustering. Due to their greedy nature, such techniques do not guarantee that a global minimum will be found and can lead to sub-optimal clustering assignments. Other classes of global search-based techniques, such as simulated annealing, tabu search, and genetic algorithms, may offer better quality results but can be too time-consuming to implement. In this work, we describe how quantum annealing can be used to carry out clustering. We map the clustering objective to a quadratic binary optimization problem and discuss two clustering algorithms which are then implemented on commercially available quantum annealing hardware, as well as on a purely classical solver "qbsolv." The first algorithm assigns N data points to K clusters, and the second one can be used to perform binary clustering in a hierarchical manner. We present our results in the form of benchmarks against well-known k-means clustering and discuss the advantages and disadvantages of the proposed techniques.

  8. PRODUCT LIFECYCLE OPTIMISATION OF CAR CLIMATE CONTROLS USING ANALYTICAL HIERARCHICAL PROCESS (AHP ANALYSIS AND A MULTI-OBJECTIVE GROUPING GENETIC ALGORITHM (MOGGA

    Directory of Open Access Journals (Sweden)

    MICHAEL J. LEE

    2016-01-01

    Full Text Available A product’s lifecycle performance (e.g. assembly, outsourcing, maintenance and recycling can often be improved through modularity. However, modularisation under different and often conflicting lifecycle objectives is a complex problem that will ultimately require trade-offs. This paper presents a novel multi-objective modularity optimisation framework; the application of which is illustrated through the modularisation of a car climate control system. Central to the framework is a specially designed multi-objective grouping genetic algorithm (MOGGA that is able to generate a whole range of alternative product modularisations. Scenario analysis, using the principles of the analytical hierarchical process (AHP, is then carried out to explore the solution set and choose a suitable modular architecture that optimises the product lifecycle according to the company’s strategic vision.

  9. AHIMSA - Ad hoc histogram information measure sensing algorithm for feature selection in the context of histogram inspired clustering techniques

    Science.gov (United States)

    Dasarathy, B. V.

    1976-01-01

    An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.

  10. A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

    Science.gov (United States)

    Wang, Xueyi

    2012-02-08

    The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.

  11. A Hybrid Method for Image Segmentation Based on Artificial Fish Swarm Algorithm and Fuzzy c-Means Clustering

    Directory of Open Access Journals (Sweden)

    Li Ma

    2015-01-01

    Full Text Available Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA. The proposed algorithm combines artificial fish swarm algorithm (AFSA with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM.

  12. Assessment of Heavy Metal Pollution in Macrophytes, Water and Sediment of a Tropical Wetland System Using Hierarchical Cluster Analysis Technique

    OpenAIRE

    , N. Kumar J.I.; , M. Das; , R. Mukherji; , R.N. Kumar

    2011-01-01

    Heavy metal pollution in aquatic ecosystems is becoming a global phenomenon because these metals are indestructible and most of them have toxic effects on living organisms. Most of the fresh water bodies all over the world are getting contaminated thus declining their suitability. Therefore, monitoring and assessment of such freshwater systems has become an environmental concern. This study aims to elucidate the useful role of the cluster analysis to assess the relationship and interdependenc...

  13. The GSAM software: A global search algorithm of minima exploration for the investigation of low lying isomers of clusters

    Energy Technology Data Exchange (ETDEWEB)

    Marchal, Rémi; Carbonnière, Philippe; Pouchan, Claude [Université de Pau et des Pays de l' Adour, IPREM/ECP, UMR CNRS 5254 (France)

    2015-01-22

    The study of atomic clusters has become an increasingly active area of research in the recent years because of the fundamental interest in studying a completely new area that can bridge the gap between atomic and solid state physics. Due to their specific properties, such compounds are of great interest in the field of nanotechnology [1,2]. Here, we would present our GSAM algorithm based on a DFT exploration of the PES to find the low lying isomers of such compounds. This algorithm includes the generation of an intial set of structure from which the most relevant are selected. Moreover, an optimization process, called raking optimization, able to discard step by step all the non physically reasonnable configurations have been implemented to reduce the computational cost of this algorithm. Structural properties of Ga{sub n}Asm clusters will be presented as an illustration of the method.

  14. Investigating the provenance of iron artifacts of the Royal Iron Factory of Sao Joao de Ipanema by hierarchical cluster analysis of EDS microanalyses of slag inclusions

    Energy Technology Data Exchange (ETDEWEB)

    Mamani-Calcina, Elmer Antonio; Landgraf, Fernando Jose Gomes; Azevedo, Cesar Roberto de Farias, E-mail: c.azevedo@usp.br [Universidade de Sao Paulo (USP), Sao Paulo, SP (Brazil). Escola Politecnica. Departmento de Engenharia Metalurgica e de Materiais

    2017-01-15

    Microstructural characterization techniques, including EDX (Energy Dispersive X-ray Analysis) microanalyses, were used to investigate the slag inclusions in the microstructure of ferrous artifacts of the Royal Iron Factory of Sao Joao de Ipanema (first steel plant of Brazil, XIX century), the D. Pedro II Bridge (located in Bahia, assembled in XIX century and produced in Scotland) and the archaeological sites of Sao Miguel de Missoes (Rio Grande do Sul, Brazil, production site of iron artifacts, the XVIII century) and Afonso Sardinha (Sao Paulo, Brazil production site of iron artifacts, XVI century). The microanalyses results of the main micro constituents of the microstructure of the slag inclusions were investigated by hierarchical cluster analysis and the dendrogram with the microanalyses results of the wüstite phase (using as critical variables the contents of MnO, MgO, Al{sub 2}O{sub 3}, V{sub 2}O{sub 5} and TiO{sub 2}) allowed the identification of four clusters, which successfully represented the samples of the four investigated sites (Ipanema, Sardinha, Missoes and Bahia). Finally, the comparatively low volumetric fraction of slag inclusions in the samples of Ipanema (∼1%) suggested the existence of technological expertise at the iron making processing in the Royal Iron Factory of Sao Joao de Ipanema. (author)

  15. Critérios de formação de carteiras de ativos por meio de Hierarchical Clusters

    Directory of Open Access Journals (Sweden)

    Pierre Lucena

    2010-04-01

    Full Text Available Este artigo tem como objetivo principal apresentar e testar uma ferramenta de estatística multivariada em modelos financeiros. Essa metodologia, conhecida como análise de clusters, separa as observações em grupos com suas determinadas características, em contraste com a metodologia tradicional, que é somente a ordem com os quantis. Foi aplicada essa ferramenta em 213 ações negociadas na Bolsa de São Paulo (Bovespa, separando os grupos por tamanho e book-tomarket. Depois, as novas carteiras foram aplicadas no modelo de Fama e French (1996, comparando os resultados numa formação de carteira para quantil e análise de cluster. Foram encontrados melhores resultados na segunda metodologia. Os autores concluem que a análise de cluster pode ser mais adequada porque tende a formar grupos mais homogeneizados, sendo sua aplicação útil para a formação de carteiras e para a teoria financeira.

  16. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  17. Algorithms

    Indian Academy of Sciences (India)

    polynomial) division have been found in Vedic Mathematics which are dated much before Euclid's algorithm. A programming language Is used to describe an algorithm for execution on a computer. An algorithm expressed using a programming.

  18. Chimera states in networks of logistic maps with hierarchical connectivities

    Science.gov (United States)

    zur Bonsen, Alexander; Omelchenko, Iryna; Zakharova, Anna; Schöll, Eckehard

    2018-04-01

    Chimera states are complex spatiotemporal patterns consisting of coexisting domains of coherence and incoherence. We study networks of nonlocally coupled logistic maps and analyze systematically how the dilution of the network links influences the appearance of chimera patterns. The network connectivities are constructed using an iterative Cantor algorithm to generate fractal (hierarchical) connectivities. Increasing the hierarchical level of iteration, we compare the resulting spatiotemporal patterns. We demonstrate that a high clustering coefficient and symmetry of the base pattern promotes chimera states, and asymmetric connectivities result in complex nested chimera patterns.

  19. Research on Energy-Saving Production Scheduling Based on a Clustering Algorithm for a Forging Enterprise

    Directory of Open Access Journals (Sweden)

    Yifei Tong

    2016-02-01

    Full Text Available Energy efficiency is a buzzword of the 21st century. With the ever growing need for energy efficient and low-carbon production, it is a big challenge for high energy-consumption enterprises to reduce their energy consumption. To this aim, a forging enterprise, DVR (the abbreviation of a forging enterprise, is researched. Firstly, an investigation into the production processes of DVR is given as well as an analysis of forging production. Then, the energy-saving forging scheduling is decomposed into two sub-problems. One is for cutting and machining scheduling, which is similar to traditional machining scheduling. The other one is for forging and heat treatment scheduling. Thirdly, former forging production scheduling is presented and solved based on an improved genetic algorithm. Fourthly, the latter is discussed in detail, followed by proposed dynamic clustering and stacking combination optimization. The proposed stacking optimization requires making the gross weight of forgings as close to the maximum batch capacity as possible. The above research can help reduce the heating times, and increase furnace utilization with high energy efficiency and low carbon emissions.

  20. CAF: Cluster algorithm and a-star with fuzzy approach for lifetime enhancement in wireless sensor networks

    KAUST Repository

    Yuan, Y.; Li, C.; Yang, Y.; Zhang, Xiangliang; Li, L.

    2014-01-01

    Energy is a major factor in designing wireless sensor networks (WSNs). In particular, in the real world, battery energy is limited; thus the effective improvement of the energy becomes the key of the routing protocols. Besides, the sensor nodes are always deployed far away from the base station and the transmission energy consumption is index times increasing with the increase of distance as well. This paper proposes a new routing method for WSNs to extend the network lifetime using a combination of a clustering algorithm, a fuzzy approach, and an A-star method. The proposal is divided into two steps. Firstly, WSNs are separated into clusters using the Stable Election Protocol (SEP) method. Secondly, the combined methods of fuzzy inference and A-star algorithm are adopted, taking into account the factors such as the remaining power, the minimum hops, and the traffic numbers of nodes. Simulation results demonstrate that the proposed method has significant effectiveness in terms of balancing energy consumption as well as maximizing the network lifetime by comparing the performance of the A-star and fuzzy (AF) approach, cluster and fuzzy (CF)method, cluster and A-star (CA)method, A-star method, and SEP algorithm under the same routing criteria. 2014 Yali Yuan et al.