WorldWideScience

Sample records for based clustering method

  1. Kernel method-based fuzzy clustering algorithm

    Institute of Scientific and Technical Information of China (English)

    Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping

    2005-01-01

    The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.

  2. Convex Decomposition Based Cluster Labeling Method for Support Vector Clustering

    Institute of Scientific and Technical Information of China (English)

    Yuan Ping; Ying-Jie Tian; Ya-Jian Zhou; Yi-Xian Yang

    2012-01-01

    Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes. However,SVC's popularity is degraded by its highly intensive time complexity and poor label performance.To overcome such problems,we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset.The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs).According to a robust algorithm applied in the nearest neighboring convex hulls,the adjacency matrix of convex hulls is built up for finding the connected components; and the remaining data points would be assigned the label of the nearest convex hull appropriately.The approach's validation is guaranteed by geometric proofs.Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.

  3. Fuzzy Clustering Method for Web User Based on Pages Classification

    Institute of Scientific and Technical Information of China (English)

    ZHAN Li-qiang; LIU Da-xin

    2004-01-01

    A new method for Web users fuzzy clustering based on analysis of user interest characteristic is proposed in this article.The method first defines page fuzzy categories according to the links on the index page of the site, then computes fuzzy degree of cross page through aggregating on data of Web log.After that, by using fuzzy comprehensive evaluation method, the method constructs user interest vectors according to page viewing times and frequency of hits, and derives the fuzzy similarity matrix from the interest vectors for the Web users.Finally, it gets the clustering result through the fuzzy clustering method.The experimental results show the effectiveness of the method.

  4. A Clustering Method Based on the Maximum Entropy Principle

    Directory of Open Access Journals (Sweden)

    Edwin Aldana-Bobadilla

    2015-01-01

    Full Text Available Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are “similar” to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method’s effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method.

  5. A dynamic fuzzy clustering method based on genetic algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHENG Yan; ZHOU Chunguang; LIANG Yanchun; GUO Dongwei

    2003-01-01

    A dynamic fuzzy clustering method is presented based on the genetic algorithm. By calculating the fuzzy dissimilarity between samples the essential associations among samples are modeled factually. The fuzzy dissimilarity between two samples is mapped into their Euclidean distance, that is, the high dimensional samples are mapped into the two-dimensional plane. The mapping is optimized globally by the genetic algorithm, which adjusts the coordinates of each sample, and thus the Euclidean distance, to approximate to the fuzzy dissimilarity between samples gradually. A key advantage of the proposed method is that the clustering is independent of the space distribution of input samples, which improves the flexibility and visualization. This method possesses characteristics of a faster convergence rate and more exact clustering than some typical clustering algorithms. Simulated experiments show the feasibility and availability of the proposed method.

  6. An improved unsupervised clustering-based intrusion detection method

    Science.gov (United States)

    Hai, Yong J.; Wu, Yu; Wang, Guo Y.

    2005-03-01

    Practical Intrusion Detection Systems (IDSs) based on data mining are facing two key problems, discovering intrusion knowledge from real-time network data, and automatically updating them when new intrusions appear. Most data mining algorithms work on labeled data. In order to set up basic data set for mining, huge volumes of network data need to be collected and labeled manually. In fact, it is rather difficult and impractical to label intrusions, which has been a big restrict for current IDSs and has led to limited ability of identifying all kinds of intrusion types. An improved unsupervised clustering-based intrusion model working on unlabeled training data is introduced. In this model, center of a cluster is defined and used as substitution of this cluster. Then all cluster centers are adopted to detect intrusions. Testing on data sets of KDDCUP"99, experimental results demonstrate that our method has good performance in detection rate. Furthermore, the incremental-learning method is adopted to detect those unknown-type intrusions and it decreases false positive rate.

  7. Super pixel density based clustering automatic image classification method

    Science.gov (United States)

    Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu

    2015-12-01

    The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.

  8. Optimal sensor placement using FRFs-based clustering method

    Science.gov (United States)

    Li, Shiqi; Zhang, Heng; Liu, Shiping; Zhang, Zhe

    2016-12-01

    The purpose of this work is to develop an optimal sensor placement method by selecting the most relevant degrees of freedom as actual measure position. Based on observation matrix of a structure's frequency response, two optimal criteria are used to avoid the information redundancy of the candidate degrees of freedom. By using principal component analysis, the frequency response matrix can be decomposed into principal directions and their corresponding singular. A relatively small number of principal directions will maintain a system's dominant response information. According to the dynamic similarity of each degree of freedom, the k-means clustering algorithm is designed to classify the degrees of freedom, and effective independence method deletes the sensors which are redundant of each cluster. Finally, two numerical examples and a modal test are included to demonstrate the efficient of the derived method. It is shown that the proposed method provides a way to extract sub-optimal sets and the selected sensors are well distributed on the whole structure.

  9. Urban Fire Risk Clustering Method Based on Fire Statistics

    Institute of Scientific and Technical Information of China (English)

    WU Lizhi; REN Aizhu

    2008-01-01

    Fire statistics and fire analysis have become important ways for us to understand the law of fire,prevent the occurrence of fire, and improve the ability to control fire. According to existing fire statistics, the weighted fire risk calculating method characterized by the number of fire occurrence, direct economic losses,and fire casualties was put forward. On the basis of this method, meanwhile having improved K-mean clus-tering arithmetic, this paper established fire dsk K-mean clustering model, which could better resolve the automatic classifying problems towards fire risk. Fire risk cluster should be classified by the absolute dis-tance of the target instead of the relative distance in the traditional cluster arithmetic. Finally, for applying the established model, this paper carded out fire risk clustering on fire statistics from January 2000 to December 2004 of Shenyang in China. This research would provide technical support for urban fire management.

  10. Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm

    OpenAIRE

    Dong Qin

    2014-01-01

    Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstab...

  11. Clustering method based on data division and partition

    Institute of Scientific and Technical Information of China (English)

    卢志茂; 刘晨; 张春祥; 王蕾

    2014-01-01

    Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

  12. A dynamic hierarchical clustering method for trajectory-based unusual video event detection.

    Science.gov (United States)

    Jiang, Fan; Wu, Ying; Katsaggelos, Aggelos K

    2009-04-01

    The proposed unusual video event detection method is based on unsupervised clustering of object trajectories, which are modeled by hidden Markov models (HMM). The novelty of the method includes a dynamic hierarchical process incorporated in the trajectory clustering algorithm to prevent model overfitting and a 2-depth greedy search strategy for efficient clustering.

  13. Spectral methods and cluster structure in correlation-based networks

    Science.gov (United States)

    Heimo, Tapio; Tibély, Gergely; Saramäki, Jari; Kaski, Kimmo; Kertész, János

    2008-10-01

    We investigate how in complex systems the eigenpairs of the matrices derived from the correlations of multichannel observations reflect the cluster structure of the underlying networks. For this we use daily return data from the NYSE and focus specifically on the spectral properties of weight W=|-δ and diffusion matrices D=W/sj-δ, where C is the correlation matrix and si=∑jW the strength of node j. The eigenvalues (and corresponding eigenvectors) of the weight matrix are ranked in descending order. As in the earlier observations, the first eigenvector stands for a measure of the market correlations. Its components are, to first approximation, equal to the strengths of the nodes and there is a second order, roughly linear, correction. The high ranking eigenvectors, excluding the highest ranking one, are usually assigned to market sectors and industrial branches. Our study shows that both for weight and diffusion matrices the eigenpair analysis is not capable of easily deducing the cluster structure of the network without a priori knowledge. In addition we have studied the clustering of stocks using the asset graph approach with and without spectrum based noise filtering. It turns out that asset graphs are quite insensitive to noise and there is no sharp percolation transition as a function of the ratio of bonds included, thus no natural threshold value for that ratio seems to exist. We suggest that these observations can be of use for other correlation based networks as well.

  14. Segmentation of MRI Volume Data Based on Clustering Method

    Directory of Open Access Journals (Sweden)

    Ji Dongsheng

    2016-01-01

    Full Text Available Here we analyze the difficulties of segmentation without tag line of left ventricle MR images, and propose an algorithm for automatic segmentation of left ventricle (LV internal and external profiles. Herein, we propose an Incomplete K-means and Category Optimization (IKCO method. Initially, using Hough transformation to automatically locate initial contour of the LV, the algorithm uses a simple approach to complete data subsampling and initial center determination. Next, according to the clustering rules, the proposed algorithm finishes MR image segmentation. Finally, the algorithm uses a category optimization method to improve segmentation results. Experiments show that the algorithm provides good segmentation results.

  15. Image Clustering Method Based on Density Maps Derived from Self-Organizing Mapping: SOM

    Directory of Open Access Journals (Sweden)

    Kohei Arai

    2012-07-01

    Full Text Available A new method for image clustering with density maps derived from Self-Organizing Maps (SOM is proposed together with a clarification of learning processes during a construction of clusters. It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. In accordance with the experimental results with Landsat-5 TM image, it takes more than 20000 of iteration for convergence of the SOM learning processes.

  16. Clustering scientific publications based on citation relations: A systematic comparison of different methods

    CERN Document Server

    Šubelj, Lovro; Waltman, Ludo

    2015-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between di...

  17. A clustering method of Chinese medicine prescriptions based on modified firefly algorithm.

    Science.gov (United States)

    Yuan, Feng; Liu, Hong; Chen, Shou-Qiang; Xu, Liang

    2016-12-01

    This paper is aimed to study the clustering method for Chinese medicine (CM) medical cases. The traditional K-means clustering algorithm had shortcomings such as dependence of results on the selection of initial value, trapping in local optimum when processing prescriptions form CM medical cases. Therefore, a new clustering method based on the collaboration of firefly algorithm and simulated annealing algorithm was proposed. This algorithm dynamically determined the iteration of firefly algorithm and simulates sampling of annealing algorithm by fitness changes, and increased the diversity of swarm through expansion of the scope of the sudden jump, thereby effectively avoiding premature problem. The results from confirmatory experiments for CM medical cases suggested that, comparing with traditional K-means clustering algorithms, this method was greatly improved in the individual diversity and the obtained clustering results, the computing results from this method had a certain reference value for cluster analysis on CM prescriptions.

  18. K2: A new method for the detection of galaxy clusters based on CFHTLS multicolor images

    CERN Document Server

    Thanjavur, Karun; Crampton, David

    2009-01-01

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte-Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially K2 was applied to 161 sq deg of two color gri images of the CFHTLS-Wide data. Our simulations show that the false detection rate, at our selected threshold, is only ~1%, and that the cluster catalogs are ~80% complete up to a redshift of 0.6 for Fornax-like and richer clusters and to z ~0.3 for poorer clusters. Based on Terapix T05 release gri photometric catalogs, 35 clusters/sq deg are detected, with 1-2 Fornax-like or richer clusters every two square degrees. Catalogs co...

  19. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods.

    Science.gov (United States)

    Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo

    2016-01-01

    Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.

  20. A method for context-based adaptive QRS clustering in real-time

    CERN Document Server

    Castro, Daniel; Presedo, Jesús

    2014-01-01

    Continuous follow-up of heart condition through long-term electrocardiogram monitoring is an invaluable tool for diagnosing some cardiac arrhythmias. In such context, providing tools for fast locating alterations of normal conduction patterns is mandatory and still remains an open issue. This work presents a real-time method for adaptive clustering QRS complexes from multilead ECG signals that provides the set of QRS morphologies that appear during an ECG recording. The method processes the QRS complexes sequentially, grouping them into a dynamic set of clusters based on the information content of the temporal context. The clusters are represented by templates which evolve over time and adapt to the QRS morphology changes. Rules to create, merge and remove clusters are defined along with techniques for noise detection in order to avoid their proliferation. To cope with beat misalignment, Derivative Dynamic Time Warping is used. The proposed method has been validated against the MIT-BIH Arrhythmia Database and...

  1. An effective trust-based recommendation method using a novel graph clustering algorithm

    Science.gov (United States)

    Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin

    2015-10-01

    Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.

  2. A semantics-based method for clustering of Chinese web search results

    Science.gov (United States)

    Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong

    2014-01-01

    Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.

  3. An efficient method of key-frame extraction based on a cluster algorithm.

    Science.gov (United States)

    Zhang, Qiang; Yu, Shao-Pei; Zhou, Dong-Sheng; Wei, Xiao-Peng

    2013-12-18

    This paper proposes a novel method of key-frame extraction for use with motion capture data. This method is based on an unsupervised cluster algorithm. First, the motion sequence is clustered into two classes by the similarity distance of the adjacent frames so that the thresholds needed in the next step can be determined adaptively. Second, a dynamic cluster algorithm called ISODATA is used to cluster all the frames and the frames nearest to the center of each class are automatically extracted as key-frames of the sequence. Unlike many other clustering techniques, the present improved cluster algorithm can automatically address different motion types without any need for specified parameters from users. The proposed method is capable of summarizing motion capture data reliably and efficiently. The present work also provides a meaningful comparison between the results of the proposed key-frame extraction technique and other previous methods. These results are evaluated in terms of metrics that measure reconstructed motion and the mean absolute error value, which are derived from the reconstructed data and the original data.

  4. Improved fuzzy identification method based on Hough transformation and fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    刘福才; 路平立; 潘江华; 裴润

    2004-01-01

    This paper presents an approach that is useful for the identification of a fuzzy model in SISO system. The initial values of cluster centers are identified by the Hough transformation, which considers the linearity and continuity of given input-output data, respectively. For the premise parts parameters identification, we use fuzzy-C-means clustering method. The consequent parameters are identified based on recursive least square. This method not only makes approximation more accurate, but also let computation be simpler and the procedure is realized more easily. Finally, it is shown that this method is useful for the identification of a fuzzy model by simulation.

  5. A New Keyphrases Extraction Method Based on Suffix Tree Data Structure for Arabic Documents Clustering

    Directory of Open Access Journals (Sweden)

    Issam SAHMOUDI

    2013-12-01

    Full Text Available Document Clustering is a branch of a larger area of scientific study kn own as data mining .which is an unsupervised classification using to find a structu re in a collection of unlabeled data. The useful information in the documents can be accompanied b y a large amount of noise words when using Full Tex t Representation, and therefore will affect negativel y the result of the clustering process. So it is w ith great need to eliminate the noise words and keeping just the useful information in order to enhance the qual ity of the clustering results. This problem occurs with di fferent degree for any language such as English, European, Hindi, Chinese, and Arabic Language. To o vercome this problem, in this paper, we propose a new and efficient Keyphrases extraction method base d on the Suffix Tree data structure (KpST, the extracted Keyphrases are then used in the clusterin g process instead of Full Text Representation. The proposed method for Keyphrases extraction is langua ge independent and therefore it may be applied to a ny language. In this investigation, we are interested to deal with the Arabic language which is one of th e most complex languages. To evaluate our method, we condu ct an experimental study on Arabic Documents using the most popular Clustering approach of Hiera rchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a varie ty of distance functions and similarity measures to perform Arabic Document Clustering task. The obtain ed results show that our method for extracting Keyphrases increases the quality of the clustering results. We propose also to study the effect of using the stemming for the testing dataset to cluster it with the same documents clustering techniques and similarity/distance measures.

  6. New Clustering Method in High-Dimensional Space Based on Hypergraph-Models

    Institute of Scientific and Technical Information of China (English)

    CHEN Jian-bin; WANG Shu-jing; SONG Han-tao

    2006-01-01

    To overcome the limitation of the traditional clustering algorithms which fail to produce meanirigful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attribute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value.Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.

  7. A Load Balancing Algorithm Based on Maximum Entropy Methods in Homogeneous Clusters

    Directory of Open Access Journals (Sweden)

    Long Chen

    2014-10-01

    Full Text Available In order to solve the problems of ill-balanced task allocation, long response time, low throughput rate and poor performance when the cluster system is assigning tasks, we introduce the concept of entropy in thermodynamics into load balancing algorithms. This paper proposes a new load balancing algorithm for homogeneous clusters based on the Maximum Entropy Method (MEM. By calculating the entropy of the system and using the maximum entropy principle to ensure that each scheduling and migration is performed following the increasing tendency of the entropy, the system can achieve the load balancing status as soon as possible, shorten the task execution time and enable high performance. The result of simulation experiments show that this algorithm is more advanced when it comes to the time and extent of the load balance of the homogeneous cluster system compared with traditional algorithms. It also provides novel thoughts of solutions for the load balancing problem of the homogeneous cluster system.

  8. A Cluster-based Method to Map Urban Area from DMSP/OLS Nightlights

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Yuyu; Smith, Steven J.; Elvidge, Christopher; Zhao, Kaiguang; Thomson, Allison M.; Imhoff, Marc L.

    2014-05-05

    Accurate information of urban areas at regional and global scales is important for both the science and policy-making communities. The Defense Meteorological Satellite Program/Operational Linescan System (DMSP/OLS) nighttime stable light data (NTL) provide a potential way to map urban area and its dynamics economically and timely. In this study, we developed a cluster-based method to estimate the optimal thresholds and map urban extents from the DMSP/OLS NTL data in five major steps, including data preprocessing, urban cluster segmentation, logistic model development, threshold estimation, and urban extent delineation. Different from previous fixed threshold method with over- and under-estimation issues, in our method the optimal thresholds are estimated based on cluster size and overall nightlight magnitude in the cluster, and they vary with clusters. Two large countries of United States and China with different urbanization patterns were selected to map urban extents using the proposed method. The result indicates that the urbanized area occupies about 2% of total land area in the US ranging from lower than 0.5% to higher than 10% at the state level, and less than 1% in China, ranging from lower than 0.1% to about 5% at the province level with some municipalities as high as 10%. The derived thresholds and urban extents were evaluated using high-resolution land cover data at the cluster and regional levels. It was found that our method can map urban area in both countries efficiently and accurately. Compared to previous threshold techniques, our method reduces the over- and under-estimation issues, when mapping urban extent over a large area. More important, our method shows its potential to map global urban extents and temporal dynamics using the DMSP/OLS NTL data in a timely, cost-effective way.

  9. Color image segmentation using watershed and Nyström method based spectral clustering

    Science.gov (United States)

    Bai, Xiaodong; Cao, Zhiguo; Yu, Zhenghong; Zhu, Hu

    2011-11-01

    Color image segmentation draws a lot of attention recently. In order to improve efficiency of spectral clustering in color image segmentation, a novel two-stage color image segmentation method is proposed. In the first stage, we use vector gradient approach to detect color image gradient information, and watershed transformation to get the pre-segmentation result. In the second stage, Nyström extension based spectral clustering is used to get the final result. To verify the proposed algorithm, it is applied to color images from the Berkeley Segmentation Dataset. Experiments show our method can bring promising results and reduce the runtime significantly.

  10. Unconventional methods for clustering

    Science.gov (United States)

    Kotyrba, Martin

    2016-06-01

    Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.

  11. Hydration of pure and base-Containing sulfuric acid clusters studied by computational chemistry methods

    Science.gov (United States)

    Henschel, Henning; Ortega, Ismael K.; Kupiainen, Oona; Olenius, Tinja; Kurtén, Theo; Vehkamäki, Hanna

    2013-05-01

    The formation of hydrates of small molecular sulfuric acid clusters and cluster containing both sulfuric acid and base (ammonia or dimethylamine) has been studied by means of computational chemistry. Using a combined ab initio/density functional approach, formation energies of clusters with up to four sulfuric acid molecules, and up to two base molecules, have been calculated. Consequences for the hydration level of the corresponding clusters have been modelled. While the majority of pure sulfuric acid cluster are comparatively strongly hydrated, base containing cluster were found to be less hydrophilic. Dimethylamine is particularly effective in lowering the hydrophilicity of the cluster. Implications of the hydration profiles on atmospheric processes are discussed.

  12. Form gene clustering method about pan-ethnic-group products based on emotional semantic

    Science.gov (United States)

    Chen, Dengkai; Ding, Jingjing; Gao, Minzhuo; Ma, Danping; Liu, Donghui

    2016-09-01

    The use of pan-ethnic-group products form knowledge primarily depends on a designer's subjective experience without user participation. The majority of studies primarily focus on the detection of the perceptual demands of consumers from the target product category. A pan-ethnic-group products form gene clustering method based on emotional semantic is constructed. Consumers' perceptual images of the pan-ethnic-group products are obtained by means of product form gene extraction and coding and computer aided product form clustering technology. A case of form gene clustering about the typical pan-ethnic-group products is investigated which indicates that the method is feasible. This paper opens up a new direction for the future development of product form design which improves the agility of product design process in the era of Industry 4.0.

  13. Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

    Directory of Open Access Journals (Sweden)

    Qunyi Xie

    2016-01-01

    Full Text Available Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF and Gaussian mixture model- (GMM- based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.

  14. A novel PPGA-based clustering analysis method for business cycle indicator selection

    Institute of Scientific and Technical Information of China (English)

    Dabin ZHANG; Lean YU; Shouyang WANG; Yingwen SONG

    2009-01-01

    A new clustering analysis method based on the pseudo parallel genetic algorithm (PPGA) is proposed for business cycle indicator selection. In the proposed method,the category of each indicator is coded by real numbers,and some illegal chromosomes are repaired by the identi-fication arid restoration of empty class. Two mutation op-erators, namely the discrete random mutation operator andthe optimal direction mutation operator, are designed to bal-ance the local convergence speed and the global convergence performance, which are then combined with migration strat-egy and insertion strategy. For the purpose of verification and illustration, the proposed method is compared with the K-means clustering algorithm and the standard genetic algo-rithms via a numerical simulation experiment. The experi-mental result shows the feasibility and effectiveness of the new PPGA-based clustering analysis algorithm. Meanwhile,the proposed clustering analysis algorithm is also applied to select the business cycle indicators to examine the status of the macro economy. Empirical results demonstrate that the proposed method can effectively and correctly select some leading indicators, coincident indicators, and lagging indi-cators to reflect the business cycle, which is extremely op-erational for some macro economy administrative managers and business decision-makers.

  15. Galaxy Cluster Mass Reconstruction Project: I. Methods and first results on galaxy-based techniques

    CERN Document Server

    Old, L; Pearce, F R; Croton, D; Muldrew, S I; Muñoz-Cuartas, J C; Gifford, D; Gray, M E; von der Linden, A; Mamon, G A; Merrifield, M R; Müller, V; Pearson, R J; Ponman, T J; Saro, A; Sepp, T; Sifón, C; Tempel, E; Tundo, E; Wang, Y O; Wojtak, R

    2014-01-01

    This paper is the first in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilise the positions, velocities and colours of galaxies. Our primary aim is to test the performance of these cluster mass estimation techniques on a diverse set of models that will increase in complexity. We begin by providing participating methods with data from a simple model that delivers idealised clusters, enabling us to quantify the underlying scatter intrinsic to these mass estimation techniques. The mock catalogue is based on a Halo Occupation Distribution (HOD) model that assumes spherical Navarro, Frenk and White (NFW) haloes truncated at R_200, with no substructure nor colour segregation, and with isotropic, isothermal Maxwellian velocities. We find that, above 10^14 M_solar, recovered cluster masses are correlated with the true underlying cluster mass with an intrinsic scatter of typically a factor of two. Below 10^14 M_solar, the scatter rises as the nu...

  16. A cluster-based method for marine sensitive object extraction and representation

    Science.gov (United States)

    Xue, Cunjin; Dong, Qing; Qin, Lijuan

    2015-08-01

    Within the context of global change, marine sensitive factors or Marine Essential Climate Variables have been defined by many projects, and their sensitive spatial regions and time phases play significant roles in regional sea-air interactions and better understanding of their dynamic process. In this paper, we propose a cluster-based method for marine sensitive region extraction and representation. This method includes a kernel expansion algorithm for extracting marine sensitive regions, and a field-object triple form, integration of object-oriented and field-based model, for representing marine sensitive objects. Firstly, this method recognizes ENSO-related spatial patterns using empirical orthogonal decomposition of long term marine sensitive factors and correlation analysis with multiple ENSO index. The cluster kernel, defined by statistics of spatial patterns, is initialized to carry out spatial expansion and cluster mergence with spatial neighborhoods recursively, then all the related lattices with similar behavior are merged into marine sensitive regions. After this, the Field-object triple form of is used to represent the marine sensitive objects, both with the discrete object with a precise extend and boundary, and the continuous field with variations dependent on spatial locations. Finally, the marine sensitive objects about sea surface temperature are extracted, represented and analyzed as a case of study, which proves the effectiveness and the efficiency of the proposed method.

  17. Swarm: robust and fast clustering method for amplicon-based studies

    Directory of Open Access Journals (Sweden)

    Frédéric Mahé

    2014-09-01

    Full Text Available Popular de novo amplicon clustering methods suffer from two fundamental flaws: arbitrary global clustering thresholds, and input-order dependency induced by centroid selection. Swarm was developed to address these issues by first clustering nearly identical amplicons iteratively using a local threshold, and then by using clusters’ internal structure and amplicon abundances to refine its results. This fast, scalable, and input-order independent approach reduces the influence of clustering parameters and produces robust operational taxonomic units.

  18. A novel intrusion detection method based on OCSVM and K-means recursive clustering

    Directory of Open Access Journals (Sweden)

    Leandros A. Maglaras

    2015-01-01

    Full Text Available In this paper we present an intrusion detection module capable of detecting malicious network traffic in a SCADA (Supervisory Control and Data Acquisition system, based on the combination of One-Class Support Vector Machine (OCSVM with RBF kernel and recursive k-means clustering. Important parameters of OCSVM, such as Gaussian width o and parameter v affect the performance of the classifier. Tuning of these parameters is of great importance in order to avoid false positives and over fitting. The combination of OCSVM with recursive k- means clustering leads the proposed intrusion detection module to distinguish real alarms from possible attacks regardless of the values of parameters o and v, making it ideal for real-time intrusion detection mechanisms for SCADA systems. Extensive simulations have been conducted with datasets extracted from small and medium sized HTB SCADA testbeds, in order to compare the accuracy, false alarm rate and execution time against the base line OCSVM method.

  19. A Method of Clustering Components into Modules Based on Products' Functional and Structural Analysis

    Institute of Scientific and Technical Information of China (English)

    MENG Xiang-hui; JIANG Zu-hua; ZHENG Ying-fei

    2006-01-01

    Modularity is the key to improving the cost-variety trade-off in product development. To achieve the functional independency and structural independency of modules, a method of clustering components to identify modules based on functional and structural analysis was presented. Two stages were included in the method. In the first stage the products' function was analyzed to determine the primary level of modules. Then the objective function for modules identifying was formulated to achieve functional independency of modules. Finally the genetic algorithm was used to solve the combinatorial optimization problem in modules identifying to form the primary modules of products. In the second stage the cohesion degree of modules and the coupling degree between modules were analyzed. Based on this structural analysis the modular scheme was refined according to the thinking of structural independency. A case study on the gear reducer was conducted to illustrate the validity of the presented method.

  20. A new methodology to define homogeneous regions through an entropy based clustering method

    Science.gov (United States)

    Ridolfi, E.; Rianna, M.; Trani, G.; Alfonso, L.; Di Baldassarre, G.; Napolitano, F.; Russo, F.

    2016-10-01

    One of the most crucial steps in flow frequency studies is the definition of Homogenous Regions (HRs), i.e. areas with similar hydrological behavior. This is essential in ungauged catchments, as HR allows information to be transferred from a neighboring river basin. This study proposes a new, entropy-based approach to define HRs, in which regions are defined as homogeneous if their hydrometric stations capture redundant information. The problem is handled through the definition of the Information Transferred Index (ITI) as the ratio between redundant information and the total information provided by pairs of stations. The methodology is compared with a traditional, distance-based clustering method through a Monte Carlo experiment and a jack-knife procedure. Results indicate that the ITI-based method performs well, adding value to current methodologies to define HRs.

  1. Hybrid Decomposition Method in Parallel Molecular Dynamics Simulation Based on SMP Cluster Architecture

    Institute of Scientific and Technical Information of China (English)

    WANG Bing; SHU Jiwu; ZHENG Weimin; WANG Jinzhao; CHEN Min

    2005-01-01

    A hybrid decomposition method for molecular dynamics simulations was presented, using simultaneously spatial decomposition and force decomposition to fit the architecture of a cluster of symmetric multi-processor (SMP) nodes. The method distributes particles between nodes based on the spatial decomposition strategy to reduce inter-node communication costs. The method also partitions particle pairs within each node using the force decomposition strategy to improve the load balance for each node. Simulation results for a nucleation process with 4 000 000 particles show that the hybrid method achieves better parallel performance than either spatial or force decomposition alone, especially when applied to a large scale particle system with non-uniform spatial density.

  2. An extended affinity propagation clustering method based on different data density types.

    Science.gov (United States)

    Zhao, XiuLi; Xu, WeiXiang

    2015-01-01

    Affinity propagation (AP) algorithm, as a novel clustering method, does not require the users to specify the initial cluster centers in advance, which regards all data points as potential exemplars (cluster centers) equally and groups the clusters totally by the similar degree among the data points. But in many cases there exist some different intensive areas within the same data set, which means that the data set does not distribute homogeneously. In such situation the AP algorithm cannot group the data points into ideal clusters. In this paper, we proposed an extended AP clustering algorithm to deal with such a problem. There are two steps in our method: firstly the data set is partitioned into several data density types according to the nearest distances of each data point; and then the AP clustering method is, respectively, used to group the data points into clusters in each data density type. Two experiments are carried out to evaluate the performance of our algorithm: one utilizes an artificial data set and the other uses a real seismic data set. The experiment results show that groups are obtained more accurately by our algorithm than OPTICS and AP clustering algorithm itself.

  3. An efficient and near linear scaling pair natural orbital based local coupled cluster method

    Science.gov (United States)

    Riplinger, Christoph; Neese, Frank

    2013-01-01

    In previous publications, it was shown that an efficient local coupled cluster method with single- and double excitations can be based on the concept of pair natural orbitals (PNOs) [F. Neese, A. Hansen, and D. G. Liakos, J. Chem. Phys. 131, 064103 (2009), 10.1063/1.3173827]. The resulting local pair natural orbital-coupled-cluster single double (LPNO-CCSD) method has since been proven to be highly reliable and efficient. For large molecules, the number of amplitudes to be determined is reduced by a factor of 105-106 relative to a canonical CCSD calculation on the same system with the same basis set. In the original method, the PNOs were expanded in the set of canonical virtual orbitals and single excitations were not truncated. This led to a number of fifth order scaling steps that eventually rendered the method computationally expensive for large molecules (e.g., >100 atoms). In the present work, these limitations are overcome by a complete redesign of the LPNO-CCSD method. The new method is based on the combination of the concepts of PNOs and projected atomic orbitals (PAOs). Thus, each PNO is expanded in a set of PAOs that in turn belong to a given electron pair specific domain. In this way, it is possible to fully exploit locality while maintaining the extremely high compactness of the original LPNO-CCSD wavefunction. No terms are dropped from the CCSD equations and domains are chosen conservatively. The correlation energy loss due to the domains remains below 8800 basis functions and >450 atoms. In all larger test calculations done so far, the LPNO-CCSD step took less time than the preceding Hartree-Fock calculation, provided no approximations have been introduced in the latter. Thus, based on the present development reliable CCSD calculations on large molecules with unprecedented efficiency and accuracy are realized.

  4. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm

    Science.gov (United States)

    de Brito, Daniel M.; Maracaja-Coutinho, Vinicius; de Farias, Savio T.; Batista, Leonardo V.; do Rêgo, Thaís G.

    2016-01-01

    Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP—Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657

  5. A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.

    Directory of Open Access Journals (Sweden)

    Daniel M de Brito

    Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.

  6. A hybrid method based on fuzzy clustering and local region-based level set for segmentation of inhomogeneous medical images.

    Science.gov (United States)

    Rastgarpour, Maryam; Shanbehzadeh, Jamshid; Soltanian-Zadeh, Hamid

    2014-08-01

    medical images are more affected by intensity inhomogeneity rather than noise and outliers. This has a great impact on the efficiency of region-based image segmentation methods, because they rely on homogeneity of intensities in the regions of interest. Meanwhile, initialization and configuration of controlling parameters affect the performance of level set segmentation. To address these problems, this paper proposes a new hybrid method that integrates a local region-based level set method with a variation of fuzzy clustering. Specifically it takes an information fusion approach based on a coarse-to-fine framework that seamlessly fuses local spatial information and gray level information with the information of the local region-based level set method. Also, the controlling parameters of level set are directly computed from fuzzy clustering result. This approach has valuable benefits such as automation, no need to prior knowledge about the region of interest (ROI), robustness on intensity inhomogeneity, automatic adjustment of controlling parameters, insensitivity to initialization, and satisfactory accuracy. So, the contribution of this paper is to provide these advantages together which have not been proposed yet for inhomogeneous medical images. Proposed method was tested on several medical images from different modalities for performance evaluation. Experimental results approve its effectiveness in segmenting medical images in comparison with similar methods.

  7. A Research on Competitiveness of Guangxi City——Based on System Clustering Method and Principal Component Analysis Method

    Institute of Scientific and Technical Information of China (English)

    2010-01-01

    A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Principal Component Analysis Method.Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level.Overall economic strength of Guangxi is weak and Nanning has relatively high scores of factors due to its advantage of the political,economic and cultural center.Comprehensive scores of other regions are all lower than 1,which has big gap with the development of Nanning.Overall development strategy points out that Guangxi should accelerate the construction of the Ring Northern Bay Economic Zone,create a strong logistics system having strategic significance to national development,use the unique location advantage and rely on the modern transportation system to establish a logistics center and business center connecting the hinterland and the Asean Market.Based on the problems of unbalanced regional economic development in Guangxi,we should speed up the development of service industry in Nanning,construct the circular economy system of industrial city,and accelerate the industrialization process of tourism city in order to realize balanced development of regional economy in Guangxi,China.

  8. The tidal tails of globular cluster Palomar 5 based on the neural networks method

    Institute of Scientific and Technical Information of China (English)

    Hu Zou; Zhen-Yu WU; Jun Ma; Xu Zhou

    2009-01-01

    The sixth Data Release (DR6) of the Sloan Digital Sky Survey (SDSS) provides more photometric regions,new features and more accurate data around globular cluster Palomar 5.A new method,Back Propagation Neural Network (BPNN),is used to estimate the cluster membership probability in order to detect its tidal tails.Cluster and field stars,used for training the networks,are extracted over a 40×20 deg~2 field by color-magnitude diagrams (CMDs).The best BPNNs with two hidden layers and a Levenberg-Marquardt(LM) training algorithm are determined by the chosen cluster and field samples.The membership probabilities of stars in the whole field are obtained with the BPNNs,and contour maps of the probability distribution show that a tail extends 5.42°to the north of the cluster and another tail extends 3.77°to the south.The tails are similar to those detected by Odenkirchen et al.,but no more debris from the cluster is found to the northeast in the sky.The radial density profiles are investigated both along the tails and near the cluster center.Quite a few substructures are discovered in the tails.The number density profile of the cluster is fitted with the King model and the tidal radius is determined as 14.28'.However,the King model cannot fit the observed profile at the outer regions (R > 8') because of the tidal tails generated by the tidal force.Luminosity functions of the cluster and the tidal tails are calculated,which confirm that the tails originate from Palomar 5.

  9. The Tidal Tails of Globular Cluster Palomar 5 Based on Neural Networks Method

    CERN Document Server

    Zou, H; Ma, J; Zhou, X

    2009-01-01

    The Sixth Data Release (DR6) in the Sloan Digital Sky Survey (SDSS) provides more photometric regions, new features and more accurate data around globular cluster Palomar 5. A new method, Back Propagation Neural Network (BPNN), is used to estimate the probability of cluster member to detect its tidal tails. Cluster and field stars, used for training the networks, are extracted over a $40\\times20$ deg$^2$ field by color-magnitude diagrams (CMDs). The best BPNNs with two hidden layers and Levenberg-Marquardt (LM) training algorithm are determined by the chosen cluster and field samples. The membership probabilities of stars in the whole field are obtained with the BPNNs, and contour maps of the probability distribution show that a tail extends $5.42\\dg$ to the north of the cluster and a tail extends $3.77\\dg$ to the south. The whole tails are similar to those detected by \\citet{od03}, but no longer debris of the cluster is found to the northeast of the sky. The radial density profiles are investigated both alon...

  10. A Continuous Clustering Method for Vector Fields

    NARCIS (Netherlands)

    Garcke, H.; Preußer, T.; Rumpf, M.; Telea, A.; Weikard, U.; Wijk, J. van

    2000-01-01

    A new method for the simplification of flow fields is presented. It is based on continuous clustering. A well-known physical clustering model, the Cahn Hillard model which describes phase separation, is modified to reflect the properties of the data to be visualized. Clusters are defined implicitly

  11. A quaternion-based spectral clustering method for color image segmentation

    Science.gov (United States)

    Li, Xiang; Jin, Lianghai; Liu, Hong; He, Zeng

    2011-11-01

    Spectral clustering method has been widely used in image segmentation. A key issue in spectral clustering is how to build the affinity matrix. When it is applied to color image segmentation, most of the existing methods either use Euclidean metric to define the affinity matrix, or first converting color-images into gray-level images and then use the gray-level images to construct the affinity matrix (component-wise method). However, it is known that Euclidean distances can not represent the color differences well and the component-wise method does not consider the correlation between color channels. In this paper, we propose a new method to produce the affinity matrix, in which the color images are first represented in quaternion form and then the similarities between color pixels are measured by quaternion rotation (QR) mechanism. The experimental results show the superiority of the new method.

  12. Microcalcification detection in full-field digital mammograms with PFCM clustering and weighted SVM-based method

    Science.gov (United States)

    Liu, Xiaoming; Mei, Ming; Liu, Jun; Hu, Wei

    2015-12-01

    Clustered microcalcifications (MCs) in mammograms are an important early sign of breast cancer in women. Their accurate detection is important in computer-aided detection (CADe). In this paper, we integrated the possibilistic fuzzy c-means (PFCM) clustering algorithm and weighted support vector machine (WSVM) for the detection of MC clusters in full-field digital mammograms (FFDM). For each image, suspicious MC regions are extracted with region growing and active contour segmentation. Then geometry and texture features are extracted for each suspicious MC, a mutual information-based supervised criterion is used to select important features, and PFCM is applied to cluster the samples into two clusters. Weights of the samples are calculated based on possibilities and typicality values from the PFCM, and the ground truth labels. A weighted nonlinear SVM is trained. During the test process, when an unknown image is presented, suspicious regions are located with the segmentation step, selected features are extracted, and the suspicious MC regions are classified as containing MC or not by the trained weighted nonlinear SVM. Finally, the MC regions are analyzed with spatial information to locate MC clusters. The proposed method is evaluated using a database of 410 clinical mammograms and compared with a standard unweighted support vector machine (SVM) classifier. The detection performance is evaluated using response receiver operating (ROC) curves and free-response receiver operating characteristic (FROC) curves. The proposed method obtained an area under the ROC curve of 0.8676, while the standard SVM obtained an area of 0.8268 for MC detection. For MC cluster detection, the proposed method obtained a high sensitivity of 92 % with a false-positive rate of 2.3 clusters/image, and it is also better than standard SVM with 4.7 false-positive clusters/image at the same sensitivity.

  13. Assessing the Eutrophication of Shengzhong Reservoir Based on Grey Clustering Method

    Institute of Scientific and Technical Information of China (English)

    Pan An; Hu Lihui; Li Tesong; Li Chengzhu

    2009-01-01

    Reservoir water environment is a grey system.The grey clustering method is applied to assessing the reservoir water envi-ronment to establish a relatively complete model suitable for the reservoir eutrophication evaluation and appropriately evaluate the quality of reservoir water, providing evidence for reservoir man-agement.According to Chiua's lakes and reservoir eutrophication criteria and the characteristics of China's entrophication, as well as certain evaluation indices, the degree of eutrophication is classified into six categories with the utilization of grey classified whitening weight function to represent the boundaries of classification, to determine the clustering weight and clustering coefficient of each index in grey classifications, and the classification of each cluster-lag object.The comprehensive evaluation of reservoir eutrophica-tion is established on such a foundation, with Sichuan Shengzhong Reservoir as the survey object and the analysis of the data attained by several typical monitoring points there in 2006.It is found that eutrophication of Tiebian Power Generation Station, Guoyu-anchang and Dashiqiao Bridge is the heaviest, Tielusi and Qing-gangya the second, and Lijiaba the least.The eutrophication of this reservoir is closely relevant to the irrational exploitation in its surrounding areas, especially to the aggravation of the non-point source pollution and the increase of net-culture fishing.Therefore, it is feasible to use grey clustering in environment quality evalu-ation, and the point lies in the correct division of grey whitening function

  14. Privacy Preserving Multiview Point Based BAT Clustering Algorithm and Graph Kernel Method for Data Disambiguation on Horizontally Partitioned Data

    Directory of Open Access Journals (Sweden)

    J. Anitha

    2015-06-01

    Full Text Available Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. Thus, the burden of data privacy protection falls on the shoulder of the data holder and data disambiguation problem occurs in the data matrix, anonymized data becomes less secure. All of the existing privacy preservation clustering methods performs clustering based on single point of view, which is the origin, while the latter utilizes many different viewpoints, which are objects assumed to not be in the same cluster with the two objects being measured. To solve this all of above mentioned problems, this study presents a multiview point based clustering methods for anonymized data. Before that data disambiguation problem is solved by using Ramon-Gartner Subtree Graph Kernel (RGSGK, where the weight values are assigned and kernel value is determined for disambiguated data. Obtain privacy by anonymization, where the data is encrypted with secure key is obtained by the Ring-Based Fully Homomorphic Encryption (RBFHE. In order to group the anonymize data, in this study BAT clustering method is proposed based on multiview point based similarity measurement and the proposed method is called as MVBAT. However in this paper initially distance matrix is calculated and using which similarity matrix and dissimilarity matrix is formed. The experimental result of the proposed MVBAT Clustering algorithm is compared with conventional methods in terms of the F-Measure, running time, privacy loss and utility loss. RBFHE encryption results is also compared with existing methods in terms of the communication cost for UCI machine learning datasets such as adult dataset and house dataset.

  15. A bottom-up method for module-based product platform development through mapping, clustering and matching analysis

    Institute of Scientific and Technical Information of China (English)

    ZHANG Meng; LI Guo-xi; CAO Jian-ping; GONG Jing-zhong; WU Bao-zhong

    2016-01-01

    Designing product platform could be an effective and efficient solution for manufacturing firms. Product platforms enable firms to provide increased product variety for the marketplace with as little variety between products as possible. Developed consumer products and modules within a firm can further be investigated to find out the possibility of product platform creation. A bottom-up method is proposed for module-based product platform through mapping, clustering and matching analysis. The framework and the parametric model of the method are presented, which consist of three steps: (1) mapping parameters from existing product families to functional modules, (2) clustering the modules within existing module families based on their parameters so as to generate module clusters, and selecting the satisfactory module clusters based on commonality, and (3) matching the parameters of the module clusters to the functional modules in order to capture platform elements. In addition, the parameter matching criterion and mismatching treatment are put forward to ensure the effectiveness of the platform process, while standardization and serialization of the platform element are presented. A design case of the belt conveyor is studied to demonstrate the feasibility of the proposed method.

  16. A Method for Traffic Congestion Clustering Judgment Based on Grey Relational Analysis

    Directory of Open Access Journals (Sweden)

    Yingya Zhang

    2016-05-01

    Full Text Available Traffic congestion clustering judgment is a fundamental problem in the study of traffic jam warning. However, it is not satisfactory to judge traffic congestion degrees using only vehicle speed. In this paper, we collect traffic flow information with three properties (traffic flow velocity, traffic flow density and traffic volume of urban trunk roads, which is used to judge the traffic congestion degree. We first define a grey relational clustering model by leveraging grey relational analysis and rough set theory to mine relationships of multidimensional-attribute information. Then, we propose a grey relational membership degree rank clustering algorithm (GMRC to discriminant clustering priority and further analyze the urban traffic congestion degree. Our experimental results show that the average accuracy of the GMRC algorithm is 24.9% greater than that of the K-means algorithm and 30.8% greater than that of the Fuzzy C-Means (FCM algorithm. Furthermore, we find that our method can be more conducive to dynamic traffic warnings.

  17. 基于多聚类结果融合的轨迹聚类方法%Trajectory Clustering Method Based on Multi-clustering Results Merging

    Institute of Scientific and Technical Information of China (English)

    李静; 张磊; 韩陈寿

    2011-01-01

    针对轨迹聚类结果的不可靠性,提出一种基于多聚类结果融合的轨迹聚类方法MRMTC.对于多聚类器产生的多个聚类代表轨迹,提出了轨迹合并算法,实现了多个聚类代表轨迹的合并.代表轨迹合并算法以平均扫描线距离函数作为共识函数,通过共识函数对代表轨迹间的相似度进行比较,最后合并相似的代表轨迹.实验表明基于融合的轨迹聚类方法,可以获得比单一聚类更有效更稳定的聚类结果.%In view of the unreliable of trajectory clustering results,a trajectory clustering method based on multi-clustering results merging(MRMTC) is proposed in this paper.For the representative trajectories of multi-clustering generated by clustering devices,a trajectory merging algorithm is proposed to merge them.The merging algorithm uses average scan line distance function as consensus function to compare similarities of representative trajectories,and then merges the similar representative trajectories.Finally,experiments results show that the proposed method MRMTC can produce more stable and effective clustering results.

  18. A Semantic-based Clustering Method to Build Domain Ontology from Multiple Heterogeneous Knowledge Sources

    Institute of Scientific and Technical Information of China (English)

    LING Ling; HU Yu-jin; WANG Xue-lin; LI Cheng-gang

    2006-01-01

    In order to improve the efficiency of ontology construction from heterogeneous knowledge sources, a semantic-based approach is presented. The ontology will be constructed with the application of cluster technique in an incremental way.Firstly, terms will be extracted from knowledge sources and congregate a term set after pretreat-ment. Then the concept set will be built via semantic-based clustering according to semanteme of terms provided by WordNet. Next, a concept tree is constructed in terms of mapping rules between semanteme relationships and concept relationships. The semi-automatic approach can avoid non-consistence due to knowledge engineers having different understanding of the same concept and the obtained ontology is easily to be expanded.

  19. Monte Carlo-based fluorescence molecular tomography reconstruction method accelerated by a cluster of graphic processing units.

    Science.gov (United States)

    Quan, Guotao; Gong, Hui; Deng, Yong; Fu, Jianwei; Luo, Qingming

    2011-02-01

    High-speed fluorescence molecular tomography (FMT) reconstruction for 3-D heterogeneous media is still one of the most challenging problems in diffusive optical fluorescence imaging. In this paper, we propose a fast FMT reconstruction method that is based on Monte Carlo (MC) simulation and accelerated by a cluster of graphics processing units (GPUs). Based on the Message Passing Interface standard, we modified the MC code for fast FMT reconstruction, and different Green's functions representing the flux distribution in media are calculated simultaneously by different GPUs in the cluster. A load-balancing method was also developed to increase the computational efficiency. By applying the Fréchet derivative, a Jacobian matrix is formed to reconstruct the distribution of the fluorochromes using the calculated Green's functions. Phantom experiments have shown that only 10 min are required to get reconstruction results with a cluster of 6 GPUs, rather than 6 h with a cluster of multiple dual opteron CPU nodes. Because of the advantages of high accuracy and suitability for 3-D heterogeneity media with refractive-index-unmatched boundaries from the MC simulation, the GPU cluster-accelerated method provides a reliable approach to high-speed reconstruction for FMT imaging.

  20. A Survey of Grid Based Clustering Algorithms

    Directory of Open Access Journals (Sweden)

    MR ILANGO

    2010-08-01

    Full Text Available Cluster Analysis, an automatic process to find similar objects from a database, is a fundamental operation in data mining. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. Clustering techniques have been discussed extensively in SimilaritySearch, Segmentation, Statistics, Machine Learning, Trend Analysis, Pattern Recognition and Classification [1]. Clustering methods can be classified into i Partitioning methods ii Hierarchical methods iii Density-based methods iv Grid-based methods v Model-based methods. Grid based methods quantize the object space into a finite number of cells (hyper-rectangles and then perform the required operations on the quantized space. The main advantage of Grid based method is its fast processing time which depends on number of cells in each dimension in quantized space. In this research paper, we present some of the grid based methods such as CLIQUE (CLustering In QUEst [2], STING (STatistical INformation Grid [3], MAFIA (Merging of Adaptive Intervals Approach to Spatial Data Mining [4], Wave Cluster [5]and O-CLUSTER (Orthogonal partitioning CLUSTERing [6], as a survey andalso compare their effectiveness in clustering data objects. We also present some of the latest developments in Grid Based methods such as Axis Shifted Grid Clustering Algorithm [7] and Adaptive Mesh Refinement [Wei-Keng Liao etc] [8] to improve the processing time of objects.

  1. Improving cluster-based methods for investigating potential for insect pest species establishment: region-specific risk factors

    Directory of Open Access Journals (Sweden)

    Michael J. Watts

    2011-09-01

    Full Text Available Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regional risk factors, thereby producing region-specific risk factors. Using a database of the global distribution of crop insect pest species, we found that we were able to produce highly differentiated region-specific risk factors for insect pests. We did this by weighting cluster elements by their Euclidean distance from the target region. Using this approach meant that risk weightings were derived that were more realistic, as they were specific to the pest profile or species assemblage of each region. This weighting method provides an improved tool for estimating the potential invasion risk posed by exotic species given that they have an opportunity to establish in a target region.

  2. Molecular-based rapid inventories of sympatric diversity: A comparison of DNA barcode clustering methods applied to geography-based vs clade-based sampling of amphibians

    Indian Academy of Sciences (India)

    Andrea Paz; Andrew J Crawford

    2012-11-01

    Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within well-sampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.

  3. Molecular-based rapid inventories of sympatric diversity: a comparison of DNA barcode clustering methods applied to geography-based vs clade-based sampling of amphibians.

    Science.gov (United States)

    Paz, Andrea; Crawford, Andrew J

    2012-11-01

    Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within wellsampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.

  4. Analyses of Crime Patterns in NIBRS Data Based on a Novel Graph Theory Clustering Method: Virginia as a Case Study

    Directory of Open Access Journals (Sweden)

    Peixin Zhao

    2014-01-01

    Full Text Available This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  5. Analyses of crime patterns in NIBRS data based on a novel graph theory clustering method: Virginia as a case study.

    Science.gov (United States)

    Zhao, Peixin; Darrah, Marjorie; Nolan, Jim; Zhang, Cun-Quan

    2014-01-01

    This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS) data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.

  6. Watchdog-LEACH: A new method based on LEACH protocol to Secure Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Rohbanian

    2013-07-01

    Full Text Available Wireless sensor network comprises of small sensor nodes with limited resources. Clustered networks have been proposed in many researches to reduce the power consumption in sensor networks. LEACH is one of the most interested techniques that offer an efficient way to minimize the power consumption in sensor networks. However, due to the characteristics of restricted resources and operation in a hostile environment, WSNs are subjected to numerous threats and are vulnerable to attacks. This research proposes a solution that can be applied on LEACH to increase the level of security. In Watchdog-LEACH, some nodes are considered as watchdogs and some changes are applied on LEACH protocol for intrusion detection. Watchdog-LEACH is able to protect against a wide range of attacks and it provides security, energy efficiency and memory efficiency. The result of simulation shows that in comparison to LEACH, the energy overhead is about 2% so this method is practical and can be applied to WSNs.

  7. K2: A New Method for the Detection of Galaxy Clusters Based on Canada-France-Hawaii Telescope Legacy Survey Multicolor Images

    Science.gov (United States)

    Thanjavur, Karun; Willis, Jon; Crampton, David

    2009-11-01

    We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg2 images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulations show that the false detection rate for these data, at our selected threshold, is only ~1%, and that the cluster catalogs are ~80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z ~ 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg2 are detected, with 1-2 Fornax-like or richer clusters every 2 deg2. Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses—one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.

  8. FAULT DIAGNOSIS BASED ON INTE- GRATION OF CLUSTER ANALYSIS,ROUGH SET METHOD AND FUZZY NEURAL NETWORK

    Institute of Scientific and Technical Information of China (English)

    Feng Zhipeng; Song Xigeng; Chu Fulei

    2004-01-01

    In order to increase the efficiency and decrease the cost of machinery diagnosis, a hybrid system of computational intelligence methods is presented. Firstly, the continuous attributes in diagnosis decision system are discretized with the self-organizing map (SOM) neural network. Then, dynamic reducts are computed based on rough set method, and the key conditions for diagnosis are found according to the maximum cluster ratio. Lastly, according to the optimal reduct, the adaptive neuro-fuzzy inference system (ANFIS) is designed for fault identification. The diagnosis of a diesel verifies the feasibility of engineering applications.

  9. Using Trajectory Clusters to Define the Most Relevant Features for Transient Stability Prediction Based on Machine Learning Method

    Directory of Open Access Journals (Sweden)

    Luyu Ji

    2016-11-01

    Full Text Available To achieve rapid real-time transient stability prediction, a power system transient stability prediction method based on the extraction of the post-fault trajectory cluster features of generators is proposed. This approach is conducted using data-mining techniques and support vector machine (SVM models. First, the post-fault rotor angles and generator terminal voltage magnitudes are considered as the input vectors. Second, we construct a high-confidence dataset by extracting the 27 trajectory cluster features obtained from the chosen databases. Then, by applying a filter–wrapper algorithm for feature selection, we obtain the final feature set composed of the eight most relevant features for transient stability prediction, called the global trajectory clusters feature subset (GTCFS, which are validated by receiver operating characteristic (ROC analysis. Comprehensive simulations are conducted on a New England 39-bus system under various operating conditions, load levels and topologies, and the transient stability predicting capability of the SVM model based on the GTCFS is extensively tested. The experimental results show that the selected GTCFS features improve the prediction accuracy with high computational efficiency. The proposed method has distinct advantages for transient stability prediction when faced with incomplete Wide Area Measurement System (WAMS information, unknown operating conditions and unknown topologies and significantly improves the robustness of the transient stability prediction system.

  10. COOPERATIVE CLUSTERING BASED ON GRID AND DENSITY

    Institute of Scientific and Technical Information of China (English)

    HU Ruifei; YIN Guofu; TAN Ying; CAI Peng

    2006-01-01

    Based on the analysis of features of the grid-based clustering method-clustering in quest(CLIQUE) and density-based clustering method-density-based spatial clustering of applications with noise (DBSCAN), a new clustering algorithm named cooperative clustering based on grid and density(CLGRID) is presented. The new algorithm adopts an equivalent rule of regional inquiry and density unit identification. The central region of one class is calculated by the grid-based method and the margin region by a density-based method. By clustering in two phases and using only a small number of seed objects in representative units to expand the cluster, the frequency of region query can be decreased, and consequently the cost of time is reduced. The new algorithm retains positive features of both grid-based and density-based methods and avoids the difficulty of parameter searching. It can discover clusters of arbitrary shape with high efficiency and is not sensitive to noise. The application of CLGRID on test data sets demonstrates its validity and higher efficiency, which contrast with traditional DBSCAN with R* tree.

  11. Novel pseudo-divergence of Gaussian mixture models based speaker clustering method

    Institute of Scientific and Technical Information of China (English)

    Wang Bo; Xu Yiqiong; Li Bicheng

    2006-01-01

    Serial structure is applied to speaker recognition to reduce the algorithm delay and computational complexity. The speech is first classified into speaker class, and then searches the most likely one inside the class.Difference between Gaussian Mixture Models (GMMs) is widely applied in speaker classification. The paper proposes a novel mean of pseudo-divergence, the ratio of Inter-Model dispersion to Intra-Model dispersion, to present the difference between GMMs, to perform speaker cluster. Weight, mean and variance, GMM's components, are involved in the dispersion. Experiments indicate that the measurement can well present the difference of GMMs and has improved performance of speaker clustering.

  12. Cluster Based Text Classification Model

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases...

  13. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms

    Science.gov (United States)

    Chen, Deng-kai; Gu, Rong; Gu, Yu-feng; Yu, Sui-huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design. PMID:27630709

  14. Consumers' Kansei Needs Clustering Method for Product Emotional Design Based on Numerical Design Structure Matrix and Genetic Algorithms.

    Science.gov (United States)

    Yang, Yan-Pu; Chen, Deng-Kai; Gu, Rong; Gu, Yu-Feng; Yu, Sui-Huai

    2016-01-01

    Consumers' Kansei needs reflect their perception about a product and always consist of a large number of adjectives. Reducing the dimension complexity of these needs to extract primary words not only enables the target product to be explicitly positioned, but also provides a convenient design basis for designers engaging in design work. Accordingly, this study employs a numerical design structure matrix (NDSM) by parameterizing a conventional DSM and integrating genetic algorithms to find optimum Kansei clusters. A four-point scale method is applied to assign link weights of every two Kansei adjectives as values of cells when constructing an NDSM. Genetic algorithms are used to cluster the Kansei NDSM and find optimum clusters. Furthermore, the process of the proposed method is presented. The details of the proposed approach are illustrated using an example of electronic scooter for Kansei needs clustering. The case study reveals that the proposed method is promising for clustering Kansei needs adjectives in product emotional design.

  15. Analysis of dynamic cerebral contrast-enhanced perfusion MRI time-series based on unsupervised clustering methods

    Science.gov (United States)

    Lange, Oliver; Meyer-Baese, Anke; Wismuller, Axel; Hurdal, Monica

    2005-03-01

    We employ unsupervised clustering techniques for the analysis of dynamic contrast-enhanced perfusion MRI time-series in patients with and without stroke. "Neural gas" network, fuzzy clustering based on deterministic annealing, self-organizing maps, and fuzzy c-means clustering enable self-organized data-driven segmentation w.r.t.fine-grained differences of signal amplitude and dynamics, thus identifying asymmetries and local abnormalities of brain perfusion. We conclude that clustering is a useful extension to conventional perfusion parameter maps.

  16. Cluster Tree Based Hybrid Document Similarity Measure

    Directory of Open Access Journals (Sweden)

    M. Varshana Devi

    2015-10-01

    Full Text Available <Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.

  17. Document Clustering using Sequential Information Bottleneck Method

    CERN Document Server

    Gayathri, P J; Punithavalli, M

    2010-01-01

    This paper illustrates the Principal Direction Divisive Partitioning (PDDP) algorithm and describes its drawbacks and introduces a combinatorial framework of the Principal Direction Divisive Partitioning (PDDP) algorithm, then describes the simplified version of the EM algorithm called the spherical Gaussian EM (sGEM) algorithm and Information Bottleneck method (IB) is a technique for finding accuracy, complexity and time space. The PDDP algorithm recursively splits the data samples into two sub clusters using the hyper plane normal to the principal direction derived from the covariance matrix, which is the central logic of the algorithm. However, the PDDP algorithm can yield poor results, especially when clusters are not well separated from one another. To improve the quality of the clustering results problem, it is resolved by reallocating new cluster membership using the IB algorithm with different settings. IB Method gives accuracy but time consumption is more. Furthermore, based on the theoretical backgr...

  18. Lick Indices and Spectral Energy Distribution Analysis based on an M31 Star Cluster Sample: Comparisons of Methods and Models

    CERN Document Server

    Fan, Zhou; Chen, Bingqiu; Jiang, Linhua; Bian, Fuyan; Li, Zhongmu

    2016-01-01

    Application of fitting techniques to obtain physical parameters---such as ages, metallicities, and $\\alpha$-element to iron ratios---of stellar populations is an important approach to understand the nature of both galaxies and globular clusters (GCs). In fact, fitting methods based on different underlying models may yield different results, and with varying precision. In this paper, we have selected 22 confirmed M31 GCs for which we do not have access to previously known spectroscopic metallicities. Most are located at approximately one degree (in projection) from the galactic center. We performed spectroscopic observations with the 6.5 m MMT telescope, equipped with its Red Channel Spectrograph. Lick/IDS absorption-line indices, radial velocities, ages, and metallicities were derived based on the $\\rm EZ\\_Ages$ stellar population parameter calculator. We also applied full spectral fitting with the ULySS code to constrain the parameters of our sample star clusters. In addition, we performed $\\chi^2_{\\rm min}$...

  19. Cycle-Based Cluster Variational Method for Direct and Inverse Inference

    Science.gov (United States)

    Furtlehner, Cyril; Decelle, Aurélien

    2016-08-01

    Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to 10^5 are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.

  20. A Novel Wireless Power Transfer-Based Weighed Clustering Cooperative Spectrum Sensing Method for Cognitive Sensor Networks.

    Science.gov (United States)

    Liu, Xin

    2015-10-30

    In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.

  1. Spectral clustering based on local linear approximations

    CERN Document Server

    Arias-Castro, Ery; Lerman, Gilad

    2010-01-01

    In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface, possibly contaminated with outliers. We consider a prototype for a higher-order spectral clustering method based on the residual from a local linear approximation. In an asymptotic setting where the number of points becomes large, we obtain theoretical guaranties for this algorithm and show that, both in terms of separation and robustness to outliers, it outperforms the standard spectral clustering algorithm based on pairwise distances of Ng, Jordan and Weiss (NIPS, 2001). Under some conditions on the dimension of, and the incidence angle at, an intersection, the algorithm is able to recover the intersecting clusters. The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our purposes. We discuss the cases of clusters of mixed dimensions and of clus...

  2. Cluster identification based on correlations.

    Science.gov (United States)

    Schulman, L S

    2012-04-01

    The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.

  3. A Novel Multi-Focus Image Fusion Method Based on Stochastic Coordinate Coding and Local Density Peaks Clustering

    Directory of Open Access Journals (Sweden)

    Zhiqin Zhu

    2016-11-01

    Full Text Available The multi-focus image fusion method is used in image processing to generate all-focus images that have large depth of field (DOF based on original multi-focus images. Different approaches have been used in the spatial and transform domain to fuse multi-focus images. As one of the most popular image processing methods, dictionary-learning-based spare representation achieves great performance in multi-focus image fusion. Most of the existing dictionary-learning-based multi-focus image fusion methods directly use the whole source images for dictionary learning. However, it incurs a high error rate and high computation cost in dictionary learning process by using the whole source images. This paper proposes a novel stochastic coordinate coding-based image fusion framework integrated with local density peaks. The proposed multi-focus image fusion method consists of three steps. First, source images are split into small image patches, then the split image patches are classified into a few groups by local density peaks clustering. Next, the grouped image patches are used for sub-dictionary learning by stochastic coordinate coding. The trained sub-dictionaries are combined into a dictionary for sparse representation. Finally, the simultaneous orthogonal matching pursuit (SOMP algorithm is used to carry out sparse representation. After the three steps, the obtained sparse coefficients are fused following the max L1-norm rule. The fused coefficients are inversely transformed to an image by using the learned dictionary. The results and analyses of comparison experiments demonstrate that fused images of the proposed method have higher qualities than existing state-of-the-art methods.

  4. Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods

    Directory of Open Access Journals (Sweden)

    Zweig Katharina A

    2009-10-01

    Full Text Available Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and

  5. Niching method using clustering crowding

    Institute of Scientific and Technical Information of China (English)

    GUO Guan-qi; GUI Wei-hua; WU Min; YU Shou-yi

    2005-01-01

    This study analyzes drift phenomena of deterministic crowding and probabilistic crowding by using equivalence class model and expectation proportion equations. It is proved that the replacement errors of deterministic crowding cause the population converging to a single individual, thus resulting in premature stagnation or losing optional optima. And probabilistic crowding can maintain equilibrium multiple subpopulations as the population size is adequate large. An improved niching method using clustering crowding is proposed. By analyzing topology of fitness landscape using hill valley function and extending the search space for similarity analysis, clustering crowding determines the locality of search space more accurately, thus greatly decreasing replacement errors of crowding. The integration of deterministic and probabilistic replacement increases the capacity of both parallel local hill climbing and maintaining multiple subpopulations. The experimental results optimizing various multimodal functions show that,the performances of clustering crowding, such as the number of effective peaks maintained, average peak ratio and global optimum ratio are uniformly superior to those of the evolutionary algorithms using fitness sharing, simple deterministic crowding and probabilistic crowding.

  6. Cosine-Based Clustering Algorithm Approach

    Directory of Open Access Journals (Sweden)

    Mohammed A. H. Lubbad

    2012-02-01

    Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.

  7. Document Clustering Based on Semi-Supervised Term Clustering

    Directory of Open Access Journals (Sweden)

    Hamid Mahmoodi

    2012-05-01

    Full Text Available The study is conducted to propose a multi-step feature (term selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to first group the terms using FCM algorithm and then cluster documents based on terms clusters. We evaluate effectiveness of our technique on several standard text collections and compare our results with the some classical text clustering algorithms.

  8. Sequential Clustering based Facial Feature Extraction Method for Automatic Creation of Facial Models from Orthogonal Views

    CERN Document Server

    Ghahari, Alireza

    2009-01-01

    Multiview 3D face modeling has attracted increasing attention recently and has become one of the potential avenues in future video systems. We aim to make more reliable and robust automatic feature extraction and natural 3D feature construction from 2D features detected on a pair of frontal and profile view face images. We propose several heuristic algorithms to minimize possible errors introduced by prevalent nonperfect orthogonal condition and noncoherent luminance. In our approach, we first extract the 2D features that are visible to both cameras in both views. Then, we estimate the coordinates of the features in the hidden profile view based on the visible features extracted in the two orthogonal views. Finally, based on the coordinates of the extracted features, we deform a 3D generic model to perform the desired 3D clone modeling. Present study proves the scope of resulted facial models for practical applications like face recognition and facial animation.

  9. A Clustered Multiclass Likelihood-Ratio Ensemble Method for Family-Based Association Analysis Accounting for Phenotypic Heterogeneity.

    Science.gov (United States)

    Wen, Yalu; Lu, Qing

    2016-09-01

    Although compelling evidence suggests that the genetic etiology of complex diseases could be heterogeneous in subphenotype groups, little attention has been paid to phenotypic heterogeneity in genetic association analysis of complex diseases. Simply ignoring phenotypic heterogeneity in association analysis could result in attenuated estimates of genetic effects and low power of association tests if subphenotypes with similar clinical manifestations have heterogeneous underlying genetic etiologies. To facilitate the family-based association analysis allowing for phenotypic heterogeneity, we propose a clustered multiclass likelihood-ratio ensemble (CMLRE) method. The proposed method provides an alternative way to model the complex relationship between disease outcomes and genetic variants. It allows for heterogeneous genetic causes of disease subphenotypes and can be applied to various pedigree structures. Through simulations, we found CMLRE outperformed the commonly adopted strategies in a variety of underlying disease scenarios. We further applied CMLRE to a family-based dataset from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICOC) to investigate the genetic variants and interactions predisposing to subphenotypes of oral clefts. The analysis suggested that two subphenotypes, nonsyndromic cleft lip without palate (CL) and cleft lip with palate (CLP), shared similar genetic etiologies, while cleft palate only (CP) had its own genetic mechanism. The analysis further revealed that rs10863790 (IRF6), rs7017252 (8q24), and rs7078160 (VAX1) were jointly associated with CL/CLP, while rs7969932 (TBK1), rs227731 (17q22), and rs2141765 (TBK1) jointly contributed to CP.

  10. Clustering in Water Based Magnetic Nanofluids: Investigations by Light Scattering Methods

    Science.gov (United States)

    Socoliuc, Vlad; Taculescu, Alina; Podaru, Camelia; Dobra, Andreea; Daia, Camelia; Marinica, Oana; Turcu, Rodica; Vekas, Ladislau

    2010-12-01

    Nanosized magnetite particles, with mean physical diameter of about 7 nm, obtained by chemical coprecipitation procedure were dispersed in water carrier by applying sterical stabilization of particles in order to prevent their aggregation and to ensure colloidal stability of the systems. Different chain length (C12, C14, C18) carboxylic acids (lauric (LA), myristic (MA) and oleic (OA)) were used for double layer coating of magnetite nanoparticles. Structural and magnetic properties were investigated by electron microscopy (TEM), dynamical and static light scattering (DLS, SLS) and magnetometry (VSM) to evaluate the role of chain length and of the saturated/unsaturated nature of surfactant layers. Also investigated were two water based magnetic nanocomposites obtained by encapsulating the magnetic nanoparticles in polymers with different functional properties.

  11. The SMART CLUSTER METHOD - adaptive earthquake cluster analysis and declustering

    Science.gov (United States)

    Schaefer, Andreas; Daniell, James; Wenzel, Friedemann

    2016-04-01

    Earthquake declustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity with usual applications comprising of probabilistic seismic hazard assessments (PSHAs) and earthquake prediction methods. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation. Various methods have been developed to address this issue from other researchers. These have differing ranges of complexity ranging from rather simple statistical window methods to complex epidemic models. This study introduces the smart cluster method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal identification. Hereby, an adaptive search algorithm for data point clusters is adopted. It uses the earthquake density in the spatio-temporal neighbourhood of each event to adjust the search properties. The identified clusters are subsequently analysed to determine directional anisotropy, focussing on a strong correlation along the rupture plane and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010/2011 Darfield-Christchurch events, an adaptive classification procedure is applied to disassemble subsequent ruptures which may have been grouped into an individual cluster using near-field searches, support vector machines and temporal splitting. The steering parameters of the search behaviour are linked to local earthquake properties like magnitude of completeness, earthquake density and Gutenberg-Richter parameters. The method is capable of identifying and classifying earthquake clusters in space and time. It is tested and validated using earthquake data from California and New Zealand. As a result of the cluster identification process, each event in

  12. Spectral clustering based on matrix perturbation theory

    Institute of Scientific and Technical Information of China (English)

    TIAN Zheng; LI XiaoBin; JU YanWei

    2007-01-01

    This paper exposes some intrinsic characteristics of the spectral clustering method by using the tools from the matrix perturbation theory. We construct a weight matrix of a graph and study its eigenvalues and eigenvectors. It shows that the number of clusters is equal to the number of eigenvalues that are larger than 1, and the number of points in each of the clusters can be approximated by the associated eigenvalue. It also shows that the eigenvector of the weight matrix can be used directly to perform clustering; that is, the directional angle between the two-row vectors of the matrix derived from the eigenvectors is a suitable distance measure for clustering. As a result, an unsupervised spectral clustering algorithm based on weight matrix (USCAWM) is developed. The experimental results on a number of artificial and real-world data sets show the correctness of the theoretical analysis.

  13. Clustering based gene expression feature selection method: A computational approach to enrich the classifier efficiency of differentially expressed genes

    KAUST Repository

    Abusamra, Heba

    2016-07-20

    The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.

  14. Fuzzy Clustering Methods and their Application to Fuzzy Modeling

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1999-01-01

    prediction of outputs. This article presents an overview of some of the most popular clustering methods, namely Fuzzy Cluster-Means (FCM) and its generalizations to Fuzzy C-Lines and Elliptotypes. The algorithms for computing cluster centers and principal directions from a training data-set are described......Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate....... A method to obtain an optimized number of clusters is outlined. Based upon the cluster's characteristics, a behavioural model is formulated in terms of a rule-base and an inference engine. The article reviews several variants for the model formulation. Some limitations of the methods are listed...

  15. Voting-based consensus clustering for combining multiple clusterings of chemical structures

    Directory of Open Access Journals (Sweden)

    Saeed Faisal

    2012-12-01

    Full Text Available Abstract Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA, cluster-based similarity partitioning algorithm (CSPA and hyper-graph partitioning algorithm (HGPA were examined. The F-measure and Quality Partition Index method (QPI were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA was the method of choice among consensus clustering methods.

  16. The Development of Cluster and Histogram Methods

    Science.gov (United States)

    Swendsen, Robert H.

    2003-11-01

    This talk will review the history of both cluster and histogram methods for Monte Carlo simulations. Cluster methods are based on the famous exact mapping by Fortuin and Kasteleyn from general Potts models onto a percolation representation. I will discuss the Swendsen-Wang algorithm, as well as its improvement and extension to more general spin models by Wolff. The Replica Monte Carlo method further extended cluster simulations to deal with frustrated systems. The history of histograms is quite extensive, and can only be summarized briefly in this talk. It goes back at least to work by Salsburg et al. in 1959. Since then, it has been forgotten and rediscovered several times. The modern use of the method has exploited its ability to efficiently determine the location and height of peaks in various quantities, which is of prime importance in the analysis of critical phenomena. The extensions of this approach to the multiple histogram method and multicanonical ensembles have allowed information to be obtained over a broad range of parameters. Histogram simulations and analyses have become standard techniques in Monte Carlo simulations.

  17. Single pass kernel -means clustering method

    Indian Academy of Sciences (India)

    T Hitendra Sarma; P Viswanath; B Eswara Reddy

    2013-06-01

    In unsupervised classification, kernel -means clustering method has been shown to perform better than conventional -means clustering method in identifying non-isotropic clusters in a data set. The space and time requirements of this method are $O(n^2)$, where is the data set size. Because of this quadratic time complexity, the kernel -means method is not applicable to work with large data sets. The paper proposes a simple and faster version of the kernel -means clustering method, called single pass kernel k-means clustering method. The proposed method works as follows. First, a random sample $\\mathcal{S}$ is selected from the data set $\\mathcal{D}$. A partition $\\Pi_{\\mathcal{S}}$ is obtained by applying the conventional kernel -means method on the random sample $\\mathcal{S}$. The novelty of the paper is, for each cluster in $\\Pi_{\\mathcal{S}}$, the exact cluster center in the input space is obtained using the gradient descent approach. Finally, each unsampled pattern is assigned to its closest exact cluster center to get a partition of the entire data set. The proposed method needs to scan the data set only once and it is much faster than the conventional kernel -means method. The time complexity of this method is $O(s^2+t+nk)$ where is the size of the random sample $\\mathcal{S}$, is the number of clusters required, and is the time taken by the gradient descent method (to find exact cluster centers). The space complexity of the method is $O(s^2)$. The proposed method can be easily implemented and is suitable for large data sets, like those in data mining applications. Experimental results show that, with a small loss of quality, the proposed method can significantly reduce the time taken than the conventional kernel -means clustering method. The proposed method is also compared with other recent similar methods.

  18. Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential Evolution

    OpenAIRE

    Satish Gajawada; Durga Toshniwal

    2012-01-01

    Differential Evolution (DE) is an algorithm for evolutionary optimization. Clustering problems have beensolved by using DE based clustering methods but these methods may fail to find clusters hidden insubspaces of high dimensional datasets. Subspace and projected clustering methods have been proposed inliterature to find subspace clusters that are present in subspaces of dataset. In this paper we proposeVINAYAKA, a semi-supervised projected clustering method based on DE. In this method DE opt...

  19. A News Recommendation Method Based on Two-Fold Clustering%基于二次聚类的新闻推荐方法

    Institute of Scientific and Technical Information of China (English)

    古万荣; 董守斌; 何锦潮; 曾之肇

    2014-01-01

    Due to fast update of news , the clustering-based preprocessing is usually needed when the news is recom-mended to users .However , some traditional clustering methods are too complicated while others rely on iterative ini -tial value , none of which can be accurately and effectively applied to news recommendation .Considering the above issues, we propose a news recommendation method based on two-fold clustering.In this method, a density clustering of random sample data is conducted .Based on the cluster number and initial cluster center of the density clustering , a fast two-fold clustering of all the news to be recommended is performed .Then, the news recommendation is realized by combining such factors as fashionability and popularity .The proposed method can cluster relevant news without too much computation cost , and it can calculate parameters by means of parameter estimation .Experimental results show that the proposed method is superior to other news recommendation methods in terms of diversity and accuracy .%由于新闻更新快,对用户进行新闻推荐往往需要进行聚类预处理,而传统方法要么复杂度过高,要么依赖于迭代初值,都不能准确而高效地应用于新闻推荐中。针对以上问题,文中提出了一个基于二次聚类的新闻推荐方法,对随机抽样数据进行密度聚类,基于该样本密度聚类的簇数和初始簇心进行所有待推荐新闻的二次快速聚类,并结合时新性、新闻热度等因素实现新闻推荐。文中方法可以将相关新闻聚集在一起,同时又不导致过高的运算开销,并通过参数估计方法计算各因素参数。实验结果表明,与其他新闻推荐方法相比,文中方法具有较好的推荐多样性和推荐准确度。

  20. Track initial algorithm based on layered clustering method%层次聚类的航迹起始算法

    Institute of Scientific and Technical Information of China (English)

    卢春燕; 金骁; 邹焕新

    2013-01-01

    A new track initial algorithm based on layered clustering method is proposed to solve multi-target detection problem with passive reconnaissance data.Especially when the passive reconnaissance scanned aperiodically,the capture plots are fragmentary,and the prior information of targets number and athletic characteristics are insufficient.The algorithm effectively utilized the attributive characteristics to solve the track initial problem.Firstly,the observation set was rough clustered according to the systems of Pulse Frequency (PF),Pulse Recurrence Frequency (PRF),Pulse Width (PW) electromagnetic parameter; Secondly,the exact result of classification was get by clustering of electromagnetic parameter using K-Meaus algorithm; Thirdly,computing the velocity of each dimension of all of the probable point pairs to eliminate the illusive observations by space-time constrained conditions; Ultimately,reset the selected observations according to their capture time,and an extended search approach is utilized to find the final initialed track.Experiments on both simulated and real world data showed its effectiveness and practicability.%针对无源侦察数据不存在周期性扫描、目标定位点迹间的时间间隔随机以及目标数量、运动特性等多项先验信息缺乏状况下的多目标检测问题,提出了层次聚类的航迹起始算法.该算法首先利用信号载频、重频、脉宽参数体制的不同对量测记录集进行粗聚类;其次对雷达工作体制相同的每一个子类,采用K-Means算法对其载频、重频、脉宽三个信号参数进行精聚类;再次对属性聚类后的每一个子类构造所有可能的配对点迹,并计算其分维速度,利用速度法筛选出满足速度约束条件的点迹;最后对筛选出的点迹按接收时间重新排序,利用扩展的搜索算法从第一个时刻开始搜索目标航迹.仿真与真实数据的实验结果验证了本文算法的有效性和实用性.

  1. XML document clustering method based on quantum genetic algorithm%基于量子遗传算法的XML聚类方法

    Institute of Scientific and Technical Information of China (English)

    蒋勇; 谭怀亮; 李光文

    2011-01-01

    This paper maiuly targets on XML clustering with kernel methods for pattern analysis and the quantum genetic algorithm.Then, a new method based on the quantum genetic algorithm and kernel clustering algorithm was proposed.To eliminate the XML documents first, the vector space kernel's kernel matrix was generated with frequent-tag sequence, the initial clustering and clustering center with the Gaussian kernel functions were solved, then the quantum genetic algorithm's initial populations were constructed by the initial clustering center structure.Clustering of the globally optimal solutions was obtained through the combination of quantum genetic algorithm and kernel clustering algorithm.The experimental results show that the proposed algorithm is superior to the improved kernel clustering algorithm and K-means in good astringency, stability and overall optimal solutions.%主要用模式分析的核方法与量子遗传算法相结合研究XML聚类,提出了一种基于量子遗传算法混合核聚算法的XML文档聚类新方法.该方法先对XML文档约简,以频繁标签序列建立向量空间核的核矩阵,用高斯核函数求解初始聚类和聚类中心,然后用初始聚类中心构造量子遗传算法的初始种群,通过量子遗传算法与核聚算法相结合求得全局最优解的聚类.实验结果表明,使用该算法的聚类比改进的核聚算法、K均值算法等单一方法具有良好的收敛性、稳定性和更高的全局最优.

  2. A Latent Variable Clustering Method for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Vasilev, Vladislav; Iliev, Georgi; Poulkov, Vladimir

    2016-01-01

    In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work...... obtain by running simulations of a time dynamic sensor network. The performance of the proposed method outperforms the existing clustering methods, such as the Girvan-Newmans algorithm, the Kargers algorithm and the Spectral Clustering method, in terms of packet acceptance probability and delay....

  3. Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures.

    Science.gov (United States)

    Saeed, Faisal; Salim, Naomie; Abdo, Ammar

    2013-07-01

    Many consensus clustering methods have been applied in different areas such as pattern recognition, machine learning, information theory and bioinformatics. However, few methods have been used for chemical compounds clustering. In this paper, an information theory and voting based algorithm (Adaptive Cumulative Voting-based Aggregation Algorithm A-CVAA) was examined for combining multiple clusterings of chemical structures. The effectiveness of clusterings was evaluated based on the ability of the clustering method to separate active from inactive molecules in each cluster, and the results were compared with Ward's method. The chemical dataset MDL Drug Data Report (MDDR) and the Maximum Unbiased Validation (MUV) dataset were used. Experiments suggest that the adaptive cumulative voting-based consensus method can improve the effectiveness of combining multiple clusterings of chemical structures.

  4. Evidence-based case selection: An innovative knowledge management method to cluster public technical and vocational education and training colleges in South Africa

    Directory of Open Access Journals (Sweden)

    Margaretha M. Visser

    2017-01-01

    Full Text Available Background: Case studies are core constructs used in information management research. A persistent challenge for business, information management and social science researchers is how to select a representative sample of cases among a population with diverse characteristics when convenient or purposive sampling is not considered rigorous enough. The context of the study is post-school education, and it involves an investigation of quantitative methods of clustering the population of public technical and vocational education and training (TVET colleges in South Africa into groups with a similar level of maturity in terms of their information systems.Objectives: The aim of the study was to propose an evidence-based quantitative method for the selection of cases for case study research and to demonstrate the use and usefulness thereof by clustering public TVET colleges.Method: The clustering method was based on the use of a representative characteristic of the context, as a proxy. In this context of management information systems (MISs, website maturity was used as a proxy and website maturity model theory was used in the development of an evaluation questionnaire. The questionnaire was used for capturing data on website characteristics, which was used to determine website maturity. The websites of the 50 public TVET colleges were evaluated by nine evaluators. Multiple statistical techniques were applied to establish inter-rater reliability and to produce clusters of colleges.Results: The analyses revealed three clusters of public TVET colleges based on their website maturity levels. The first cluster includes three colleges with no websites or websites at a low maturity level. The second cluster consists of 30 colleges with websites at an average maturity level. The third cluster contains 17 colleges with websites at a high maturity level.Conclusion: The main contribution to the knowledge domain is an innovative quantitative method employing a

  5. CLUSTERING VALIDITY BASED ON THE IMPROVED S_DBW INDEX

    Institute of Scientific and Technical Information of China (English)

    Tong Jianhua; Tan Hongzhou

    2009-01-01

    For many clustering algorithms, it is very important to determine an appropriate number of clusters, which is called cluster validity problem. In this paper, a new clustering validity assessment index is proposed based on a novel method to select the margin point between two clusters for inter-cluster similarity more accurately, and provides an improved scatter function for intra-cluster similarity. Simulation results show the effectiveness of the proposed index on the data sets under consideration regardless of the choice of a clustering algorithm.

  6. Cluster Ensemble-based Image Segmentation

    OpenAIRE

    Xiaoru Wang; Junping Du; Shuzhe Wu; Xu Li; Fu Li

    2013-01-01

    Image segmentation is the foundation of computer vision applications. In this paper, we propose a new cluster ensemble-based image segmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories ...

  7. Data Clustering Analysis Based on Wavelet Feature Extraction

    Institute of Scientific and Technical Information of China (English)

    QIANYuntao; TANGYuanyan

    2003-01-01

    A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.

  8. Content Based Image Retrieval through Clustering

    Directory of Open Access Journals (Sweden)

    Sandhya

    2012-06-01

    Full Text Available Content-based image retrieval (CBIR is a technique usedfor extracting similar images from an image database.CBIR system is required to access images effectively andefficiently using information contained in image databases.Here, K-Means is to be used for Image retrieval. The Kmeansmethod can be applied only in those cases when themean of a cluster is defined. The K-means method is notsuitable for discovering clusters with non-convex shapes orclusters of very different size. In this paper, CBIR,clustering and K-Means are defined. With the help of these,the data consisting images can be grouped and retrieved.

  9. Clustering based segmentation of text in complex color images

    Institute of Scientific and Technical Information of China (English)

    毛文革; 王洪滨; 张田文

    2004-01-01

    We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.

  10. Cluster-based adaptive metric classification

    NARCIS (Netherlands)

    Giotis, Ioannis; Petkov, Nicolai

    2012-01-01

    Introducing adaptive metric has been shown to improve the results of distance-based classification algorithms. Existing methods are often computationally intensive, either in the training or in the classification phase. We present a novel algorithm that we call Cluster-Based Adaptive Metric (CLAM) c

  11. Asemantic Web service discovery method based on service clustering%一种面向聚类的语义 Web 服务发现方法

    Institute of Scientific and Technical Information of China (English)

    薛洁; 吴兵; 杜玉越

    2012-01-01

      A semantic Web service discovery method is proposed based on service clustering. The semantic similarity of function and input/output parameters is calculated, and the services in a service library can be clustered by a clustering algorithm. The input/output parameters of a cluster are marked by a unified label, then an input/output concept set is obtained and a unit model of service cluster nets is constructed. Finally, a unit matrix of service cluster nets is proposed, and the favorable Web service could be discovered effectively based on the matrix. The formal model of service cluster net units is presented to process the service discovery. The validity and reliability of the proposedmethodareillustratedbyanexperiment,showingthattheperformanceofvalidityandcompletenessisobviouslyenhanced.%  提出了一种基于聚类的语义Web服务发现方法。通过计算Web服务的功能相似度及输入输出参数的语义相似度,利用聚类算法对服务库中的服务进行聚类。统一标注服务簇的输入输出参数,得到输入输出概念集,构造出服务簇网元模型,并提出标识服务簇网元的矩阵模型,从而实现服务的快速发现,找出最符合用户需求的服务类。给出了服务簇的形式化描述及构建算法,并进行服务发现。实验例证了所提出方法的有效性和合理性,以及在查准率和查全率方面的明显提高

  12. A New Feature Selection Method for Text Clustering

    Institute of Scientific and Technical Information of China (English)

    XU Junling; XU Baowen; ZHANG Weifeng; CUI Zifeng; ZHANG Wei

    2007-01-01

    Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, a new feature selection method for text clustering based on expectation maximization and cluster validity is proposed. It uses supervised feature selection method on the intermediate clustering result which is generated during iterative clustering to do feature selection for text clustering; meanwhile, the Davies-Bouldin's index is used to evaluate the intermediate feature subsets indirectly. Then feature subsets are selected according to the curve of the DaviesBouldin's index. Experiment is carried out on several popular datasets and the results show the advantages of the proposed method.

  13. Quartile Clustering: A quartile based technique for Generating Meaningful Clusters

    CERN Document Server

    Goswami, Saptarsi

    2012-01-01

    Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clusteri...

  14. Data Reduction Method for Categorical Data Clustering

    OpenAIRE

    Sánchez Garreta, José Salvador; Rendón, Eréndira; García, Rene A.; Abundez, Itzel; Gutiérrez, Citlalih; Gasca, Eduardo

    2008-01-01

    Categorical data clustering constitutes an important part of data mining; its relevance has recently drawn attention from several researchers. As a step in data mining, however, clustering encounters the problem of large amount of data to be processed. This article offers a solution for categorical clustering algorithms when working with high volumes of data by means of a method that summarizes the database. This is done using a structure called CM-tree. In order to test our metho...

  15. A graph clustering method for community detection in complex networks

    Science.gov (United States)

    Zhou, HongFang; Li, Jin; Li, JunHuai; Zhang, FaCun; Cui, YingAn

    2017-03-01

    Information mining from complex networks by identifying communities is an important problem in a number of research fields, including the social sciences, biology, physics and medicine. First, two concepts are introduced, Attracting Degree and Recommending Degree. Second, a graph clustering method, referred to as AR-Cluster, is presented for detecting community structures in complex networks. Third, a novel collaborative similarity measure is adopted to calculate node similarities. In the AR-Cluster method, vertices are grouped together based on calculated similarity under a K-Medoids framework. Extensive experimental results on two real datasets show the effectiveness of AR-Cluster.

  16. Fast clustering algorithm for large ECG data sets based on CS theory in combination with PCA and K-NN methods.

    Science.gov (United States)

    Balouchestani, Mohammadreza; Krishnan, Sridhar

    2014-01-01

    Long-term recording of Electrocardiogram (ECG) signals plays an important role in health care systems for diagnostic and treatment purposes of heart diseases. Clustering and classification of collecting data are essential parts for detecting concealed information of P-QRS-T waves in the long-term ECG recording. Currently used algorithms do have their share of drawbacks: 1) clustering and classification cannot be done in real time; 2) they suffer from huge energy consumption and load of sampling. These drawbacks motivated us in developing novel optimized clustering algorithm which could easily scan large ECG datasets for establishing low power long-term ECG recording. In this paper, we present an advanced K-means clustering algorithm based on Compressed Sensing (CS) theory as a random sampling procedure. Then, two dimensionality reduction methods: Principal Component Analysis (PCA) and Linear Correlation Coefficient (LCC) followed by sorting the data using the K-Nearest Neighbours (K-NN) and Probabilistic Neural Network (PNN) classifiers are applied to the proposed algorithm. We show our algorithm based on PCA features in combination with K-NN classifier shows better performance than other methods. The proposed algorithm outperforms existing algorithms by increasing 11% classification accuracy. In addition, the proposed algorithm illustrates classification accuracy for K-NN and PNN classifiers, and a Receiver Operating Characteristics (ROC) area of 99.98%, 99.83%, and 99.75% respectively.

  17. Quantum Monte Carlo methods and lithium cluster properties. [Atomic clusters

    Energy Technology Data Exchange (ETDEWEB)

    Owen, R.K.

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) (0.1981), 0.1895(9) (0.1874(4)), 0.1530(34) (0.1599(73)), 0.1664(37) (0.1724(110)), 0.1613(43) (0.1675(110)) Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) (0.0203(12)), 0.0188(10) (0.0220(21)), 0.0247(8) (0.0310(12)), 0.0253(8) (0.0351(8)) Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  18. A Latent Variable Clustering Method for Wireless Sensor Networks

    DEFF Research Database (Denmark)

    Vasilev, Vladislav; Mihovska, Albena Dimitrova; Poulkov, Vladimir

    2016-01-01

    In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work and...

  19. Kernel-based Maximum Entropy Clustering

    Institute of Scientific and Technical Information of China (English)

    JIANG Wei; QU Jiao; LI Benxi

    2007-01-01

    With the development of Support Vector Machine (SVM),the "kernel method" has been studied in a general way.In this paper,we present a novel Kernel-based Maximum Entropy Clustering algorithm (KMEC).By using mercer kernel functions,the proposed algorithm is firstly map the data from their original space to high dimensional space where the data are expected to be more separable,then perform MEC clustering in the feature space.The experimental results show that the proposed method has better performance in the non-hyperspherical and complex data structure.

  20. CNEM: Cluster Based Network Evolution Model

    Directory of Open Access Journals (Sweden)

    Sarwat Nizamani

    2015-01-01

    Full Text Available This paper presents a network evolution model, which is based on the clustering approach. The proposed approach depicts the network evolution, which demonstrates the network formation from individual nodes to fully evolved network. An agglomerative hierarchical clustering method is applied for the evolution of network. In the paper, we present three case studies which show the evolution of the networks from the scratch. These case studies include: terrorist network of 9/11 incidents, terrorist network of WMD (Weapons Mass Destruction plot against France and a network of tweets discussing a topic. The network of 9/11 is also used for evaluation, using other social network analysis methods which show that the clusters created using the proposed model of network evolution are of good quality, thus the proposed method can be used by law enforcement agencies in order to further investigate the criminal networks

  1. A Novel Cluster Head Selection Algorithm Based on Fuzzy Clustering and Particle Swarm Optimization.

    Science.gov (United States)

    Ni, Qingjian; Pan, Qianqian; Du, Huimin; Cao, Cen; Zhai, Yuqing

    2017-01-01

    An important objective of wireless sensor network is to prolong the network life cycle, and topology control is of great significance for extending the network life cycle. Based on previous work, for cluster head selection in hierarchical topology control, we propose a solution based on fuzzy clustering preprocessing and particle swarm optimization. More specifically, first, fuzzy clustering algorithm is used to initial clustering for sensor nodes according to geographical locations, where a sensor node belongs to a cluster with a determined probability, and the number of initial clusters is analyzed and discussed. Furthermore, the fitness function is designed considering both the energy consumption and distance factors of wireless sensor network. Finally, the cluster head nodes in hierarchical topology are determined based on the improved particle swarm optimization. Experimental results show that, compared with traditional methods, the proposed method achieved the purpose of reducing the mortality rate of nodes and extending the network life cycle.

  2. Sequential Combination Methods forData Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    钱 涛; Ching Y.Suen; 唐远炎

    2002-01-01

    This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.

  3. PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS

    Directory of Open Access Journals (Sweden)

    Nusa Erman

    2015-01-01

    Full Text Available A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method, and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups. We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering, the maximum (complete linkage clustering, the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method in cases when data is assumed to be fairly compact and well separated.

  4. Using an Improved Clustering Method to Detect Anomaly Activities

    Institute of Scientific and Technical Information of China (English)

    LI Han; ZHANG Nan; BAO Lihui

    2006-01-01

    In this paper, an improved k-means based clustering method (IKCM) is proposed. By refining the initial cluster centers and adjusting the number of clusters by splitting and merging procedures, it can avoid the algorithm resulting in the situation of locally optimal solution and reduce the number of clusters dependency. The IKCM has been implemented and tested. We perform experiments on KDD-99 data set. The comparison experiments with H-means+also have been conducted. The results obtained in this study are very encouraging.

  5. Spanning Tree Based Attribute Clustering

    DEFF Research Database (Denmark)

    Zeng, Yifeng; Jorge, Cordero Hernandez

    2009-01-01

    inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...

  6. Clustering Method in Data Mining%数据挖掘中的聚类方法

    Institute of Scientific and Technical Information of China (English)

    王实; 高文

    2000-01-01

    In this paper we introduce clustering method at Data Mining.Clustering has been studied very deeply.In the field of Data Mining,clustering is facing the new situation.We summarize the major clustering methods and introduce four kinds of clustering method that have been used broadly in Data Mitring.Finally we draw a conclusion that the partitional clustering method based on distance in data mining is a typical two phase iteration process:1)appoint cluster;2)update the center of cluster.

  7. Ontology Partitioning: Clustering Based Approach

    Directory of Open Access Journals (Sweden)

    Soraya Setti Ahmed

    2015-05-01

    Full Text Available The semantic web goal is to share and integrate data across different domains and organizations. The knowledge representations of semantic data are made possible by ontology. As the usage of semantic web increases, construction of the semantic web ontologies is also increased. Moreover, due to the monolithic nature of the ontology various semantic web operations like query answering, data sharing, data matching, data reuse and data integration become more complicated as the size of ontology increases. Partitioning the ontology is the key solution to handle this scalability issue. In this work, we propose a revision and an enhancement of K-means clustering algorithm based on a new semantic similarity measure for partitioning given ontology into high quality modules. The results show that our approach produces meaningful clusters than the traditional algorithm of K-means.

  8. The Effective Clustering Partition Algorithm Based on the Genetic Evolution

    Institute of Scientific and Technical Information of China (English)

    LIAO Qin; LI Xi-wen

    2006-01-01

    To the problem that it is hard to determine the clustering number and the abnormal points by using the clustering validity function, an effective clustering partition model based on the genetic algorithm is built in this paper. The solution to the problem is formed by the combination of the clustering partition and the encoding samples, and the fitness function is defined by the distances among and within clusters. The clustering number and the samples in each cluster are determined and the abnormal points are distinguished by implementing the triple random crossover operator and the mutation. Based on the known sample data, the results of the novel method and the clustering validity function are compared. Numerical experiments are given and the results show that the novel method is more effective.

  9. Variable cluster analysis method for building neural network model

    Institute of Scientific and Technical Information of China (English)

    王海东; 刘元东

    2004-01-01

    To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.

  10. Similarity Based Clustering with Indexing for Semi-Structured Document

    Directory of Open Access Journals (Sweden)

    S. Palanisamy

    2012-01-01

    Full Text Available Problem statement: To improve the performance of data retrieval in a homogeneous large XML document. Approach: Clustering of XML elements based on the content with indexing. The element which is used for clustering has been identified from the document and/or XML schema. This element is used as a parameter for clustering. The suitable index is created after clustering. Results: The clustering combined with indexing strategy support the efficient retrieval of XML element from the document. Conclusion: The proposed method is used to improve the efficiency of XML data manipulation and comparatively give the better performance rather than clustering or indexing alone.

  11. Cosmological Constraints with Clustering-Based Redshifts

    CERN Document Server

    Kovetz, Ely D; Rahman, Mubdi

    2016-01-01

    We demonstrate that observations lacking reliable redshift information, such as photometric and radio continuum surveys, can produce robust measurements of cosmological parameters when empowered by clustering-based redshift estimation. This method infers the redshift distribution based on the spatial clustering of sources, using cross-correlation with a reference dataset with known redshifts. Applying this method to the existing SDSS photometric galaxies, and projecting to future radio continuum surveys, we show that sources can be efficiently divided into several redshift bins, increasing their ability to constrain cosmological parameters. We forecast constraints on the dark-energy equation-of-state and on local non-gaussianity parameters. We explore several pertinent issues, including the tradeoff between including more sources versus minimizing the overlap between bins, the shot-noise limitations on binning, and the predicted performance of the method at high redshifts. Remarkably, we find that, once this ...

  12. 基于用户过滤的校园无线网用户聚类方法%User filtering based campus WLAN user clustering method

    Institute of Scientific and Technical Information of China (English)

    仇一泓; 尧婷娟; 秦丰林; 葛连升

    2014-01-01

    With the widespread of smart terminals such as smart phones and smart pads, using MAC address as user iden-tification in campus wireless local area network (WLAN) user clustering research cannot exactly represent user behavior. An user filtering based user clustering is proposed. This method filters users’ behavior data by their degree of activeness, and then further conducts clustering analysis of campus WLAN user behavior. The experimental result verifies the effec-tiveness of the proposed method.%随着智能终端地普及,在校园无线网用户聚类研究中采用MAC地址作为用户区分已不能真实反映用户的行为,为此,提出了一个基于用户过滤的校园无线网用户聚类方法,该方法基于用户活跃度对用户行为数据进行过滤,在此基础上对校园无线网用户行为做进一步地聚类分析。实验结果表明了该方法的有效性。

  13. A two-stage cluster sampling method using gridded population data, a GIS, and Google EarthTM imagery in a population-based mortality survey in Iraq

    Directory of Open Access Journals (Sweden)

    Galway LP

    2012-04-01

    Full Text Available Abstract Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.

  14. DNA splice site sequences clustering method for conservativeness analysis

    Institute of Scientific and Technical Information of China (English)

    Quanwei Zhang; Qinke Peng; Tao Xu

    2009-01-01

    DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.

  15. Progressive Exponential Clustering-Based Steganography

    Directory of Open Access Journals (Sweden)

    Li Yue

    2010-01-01

    Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.

  16. Modified possibilistic clustering model based on kernel methods%基于核方法的改进可能聚类模型

    Institute of Scientific and Technical Information of China (English)

    武小红; 周建红

    2008-01-01

    A novel model of fuzzy clustering using kernel methods is proposed. This model is called kernel modified possibilisticc-means (KMPCM) model. The proposed model is an extension of the modified possibilistic c-means (MPCM) algorithm byusing kernel methods. Different from MPCM and fuzzy c-means (FCM) model which are based on Euclidean distance, theproposed model is based on kernel-induced distance. Furthermore, with kernel methods the input data can be mappedimplicitly into a high-dimensional feature space where the nonlinear pattern now appears linear. It is unnecessary to docalculation in the high-dimensional feature space because the kernel function can do it. Numerical experiments show thatKMPCM outperforms FCM and MPCM.

  17. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    Energy Technology Data Exchange (ETDEWEB)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Neese, Frank, E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, D-45470 Mülheim an der Ruhr (Germany); Valeev, Edward F., E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061 (United States)

    2016-01-14

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate

  18. Sparse maps—A systematic infrastructure for reduced-scaling electronic structure methods. II. Linear scaling domain based pair natural orbital coupled cluster theory

    Science.gov (United States)

    Riplinger, Christoph; Pinski, Peter; Becker, Ute; Valeev, Edward F.; Neese, Frank

    2016-01-01

    Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate previous

  19. Firing Efficiency of Cluster Bomb Based on Method of Analogy%基于类比法的子母弹射击效率评定研究

    Institute of Scientific and Technical Information of China (English)

    殷培江; 李君; 张立生

    2012-01-01

    The method of analogy is a familiar logistic organon used in the mess of research domain, but it's rarely used in evaluation martial category. The firing efficiency of cluster bomb based on method of analogy have many advantage, for instance it is briefness, practicability, economy, and convenient for largely reckon. The paper builds the evaluation index system of the firing efficiency with cluster bomb, we found the comparable from cluster bomb to another ammunition in classical model, build the mathematics model based on method of analogy in the next place. The paper serve as an example of several emblematic targets, and display status of firing efficiency with several emblematic targets used the radar chart, we have compared the conclusions with method of analogy and emulation mode, validate the firing efficiency of cluster bomb based on method of analogy is feasible.%类比法是一种较为常见的逻辑推理方法,在各研究领域均有应用,但在军事评估领域却应用甚少.基于类比法的子母弹射击效率评定具有简单、实用、经济、便于大量计算等优点.根据子母弹与传统弹药在射击效率评定过程中的相似性,在经典毁伤评估模型的基础上,通过类比法建立字母弹的射击效率评定模型.列举了几种典型目标,并以子母弹的类比法模型进行射击效率评定,最后将类比法所得结论与目标仿真法的结论反映在雷达图上,通过数据比对验证了类比法对子母弹射击效率评定的可行性.

  20. 基于类轮廓层次聚类方法的研究%RESEARCH ON CLASS-PROFILE-BASED HIERARCHICAL CLUSTERING METHOD

    Institute of Scientific and Technical Information of China (English)

    孟海东; 唐旋

    2011-01-01

    传统的聚类算法在考虑类与类之间的连通性特征和近似性特征上往往顾此失彼.首先给出类边界点和类轮廓的基本定义以及寻求方法,然后基于类间连通性特征和近似性特征的综合考虑,拟定一些类间相似性度量标准和方法,最后提出一种基于类轮廓的层次聚类算法.该算法能够有效处理任意形状的簇,且能够区分孤立点和噪声数据.通过对图像数据集和Iris标准数据集的聚类分析,验证了该算法的可行性和有效性.%Traditional clustering algorithms are often incapable of roundly considering the connectivity and similarity characteristics among classes. The thesis firstly presents the fundamental definition of class boundary point and class profile; secondly, with comprehensive consideration based on connectivity characteristics and similarity characteristics among classes, defines some standards and methods for inter class similarity measurement; thirdly, proposes a class-profile-based hierarchical clustering algorithm, which is able to effectively process arbitrary shaped clusters and distinguish isolated points from noise data. The feasibility and effectiveness of the algorithm is validated through clustering analysis on image data sets and Iris standard data sets.

  1. CCM: A Text Classification Method by Clustering

    DEFF Research Database (Denmark)

    Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock

    2011-01-01

    In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...

  2. Structural, spectroscopic aspects, and electronic properties of (TiO2)n clusters: a study based on the use of natural algorithms in association with quantum chemical methods.

    Science.gov (United States)

    Ganguly Neogi, Soumya; Chaudhury, Pinaki

    2014-01-05

    In this article, we propose a stochastic search-based method, namely genetic algorithm (GA) and simulated annealing (SA) in conjunction with density functional theory (DFT) to evaluate global and local minimum structures of (TiO2)n clusters with n = 1-12. Once the structures are established, we evaluate the infrared spectroscopic modes, cluster formation energy, vertical excitation energy, vertical ionization potential, vertical electron affinity, highest occupied molecular orbital (HOMO)-lowest unoccupied molecular orbital (LUMO) gaps, and so forth. We show that an initial determination of structure using stochastic techniques (GA/SA), also popularly known as natural algorithms as their working principle mimics certain natural processes, and following it up with density functional calculations lead to high-quality structures for these systems. We have shown that the clusters tend to form three-dimensional networks. We compare our results with the available experimental and theoretical results. The results obtained from SA/GA-DFT technique agree well with available theoretical and experimental data of literature.

  3. Fingerprint analysis of Hibiscus mutabilis L. leaves based on ultra performance liquid chromatography with photodiode array detector combined with similarity analysis and hierarchical clustering analysis methods

    Directory of Open Access Journals (Sweden)

    Xianrui Liang

    2013-01-01

    Full Text Available Background: A method for chemical fingerprint analysis of Hibiscus mutabilis L. leaves was developed based on ultra performance liquid chromatography with photodiode array detector (UPLC-PAD combined with similarity analysis (SA and hierarchical clustering analysis (HCA. Materials and Methods: 10 batches of Hibiscus mutabilis L. leaves samples were collected from different regions of China. UPLC-PAD was employed to collect chemical fingerprints of Hibiscus mutabilis L. leaves. Results: The relative standard deviations (RSDs of the relative retention times (RRT and relative peak areas (RPA of 10 characteristic peaks (one of them was identified as rutin in precision, repeatability and stability test were less than 3%, and the method of fingerprint analysis was validated to be suitable for the Hibiscus mutabilis L. leaves. Conclusions: The chromatographic fingerprints showed abundant diversity of chemical constituents qualitatively in the 10 batches of Hibiscus mutabilis L. leaves samples from different locations by similarity analysis on basis of calculating the correlation coefficients between each two fingerprints. Moreover, the HCA method clustered the samples into four classes, and the HCA dendrogram showed the close or distant relations among the 10 samples, which was consistent to the SA result to some extent.

  4. New resampling method for evaluating stability of clusters

    Directory of Open Access Journals (Sweden)

    Neuhaeuser Markus

    2008-01-01

    Full Text Available Abstract Background Hierarchical clustering is a widely applied tool in the analysis of microarray gene expression data. The assessment of cluster stability is a major challenge in clustering procedures. Statistical methods are required to distinguish between real and random clusters. Several methods for assessing cluster stability have been published, including resampling methods such as the bootstrap. We propose a new resampling method based on continuous weights to assess the stability of clusters in hierarchical clustering. While in bootstrapping approximately one third of the original items is lost, continuous weights avoid zero elements and instead allow non integer diagonal elements, which leads to retention of the full dimensionality of space, i.e. each variable of the original data set is represented in the resampling sample. Results Comparison of continuous weights and bootstrapping using real datasets and simulation studies reveals the advantage of continuous weights especially when the dataset has only few observations, few differentially expressed genes and the fold change of differentially expressed genes is low. Conclusion We recommend the use of continuous weights in small as well as in large datasets, because according to our results they produce at least the same results as conventional bootstrapping and in some cases they surpass it.

  5. Incremental Web Usage Mining Based on Active Ant Colony Clustering

    Institute of Scientific and Technical Information of China (English)

    SHEN Jie; LIN Ying; CHEN Zhimin

    2006-01-01

    To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.

  6. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan;

    2011-01-01

    For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering ...

  7. 基于微簇的在线网络异常检测方法%Micro-cluster-based online network abnormal detection method

    Institute of Scientific and Technical Information of China (English)

    肖三; 杨雅辉; 沈晴霓

    2013-01-01

    Since online abnormal detection for backbone network with large flow currently is a research hotspot in network security field, an online network abnormal detection method is proposed to handle big data stream properly. The method processes big data stream into micro-clusters with density-based cluster method, and then micro-clusters absorb data stream directly to enhance the performance. The method regularly executes outlier detection process to find intrusion. The method does not require offline training process and can find any arbitrary clusters. It also supports big data stream and can balance between detection precision and resources with great performance. In the experiment, the prototype system finishes analysis task in 20 s over MIT Lincoln Laboratory LLS_DDOS_1.0 data, with 82% TPR and 6% FPR, which is equivalent to K-means.%针对大流量骨干网的在线网络异常检测是目前网络安全研究的热点之一,提出一种网络异常检测方法,有效在线处理大数据流,利用密度聚类算法把大数据流转换成微簇,通过微簇提高处理效率,定时调用孤立点检测算法发现攻击行为.方法具有不需线下训练、能发现任意行为模式、支持大数据流、可以平衡检测精度与系统资源要求、处理效率高等优点.实验表明,原型系统在20 s完成2000年LLS DDOS 1.0数据集分析,检测率为82%,误报率为6%,效果与K-means相当.

  8. Mercer Kernel Based Fuzzy Clustering Self-Adaptive Algorithm

    Institute of Scientific and Technical Information of China (English)

    李侃; 刘玉树

    2004-01-01

    A novel mercer kernel based fuzzy clustering self-adaptive algorithm is presented. The mercer kernel method is introduced to the fuzzy c-means clustering. It may map implicitly the input data into the high-dimensional feature space through the nonlinear transformation. Among other fuzzy c-means and its variants, the number of clusters is first determined. A self-adaptive algorithm is proposed. The number of clusters, which is not given in advance, can be gotten automatically by a validity measure function. Finally, experiments are given to show better performance with the method of kernel based fuzzy c-means self-adaptive algorithm.

  9. A novel clustering and supervising users' profiles method

    Institute of Scientific and Technical Information of China (English)

    Zhu Mingfu; Zhang Hongbin; Song Fangyun

    2005-01-01

    To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.

  10. PHISHING WEB IMAGE SEGMENTATION BASED ON IMPROVING SPECTRAL CLUSTERING

    Institute of Scientific and Technical Information of China (English)

    Li Yuancheng; Zhao Liujun; Jiao Runhai

    2011-01-01

    Abstract This paper proposes a novel phishing web image segmentation algorithm which based on improving spectral clustering.Firstly,we construct a set of points which are composed of spatial location pixels and gray levels from a given image.Secondly,the data is clustered in spectral space of the similar matrix of the set points,in order to avoid the drawbacks of K-means algorithm in the conventional spectral clustering method that is sensitive to initial clustering centroids and convergence to local optimal solution,we introduce the clone operator,Cauthy mutation to enlarge the scale of clustering centers,quantum-inspired evolutionary algorithm to find the global optimal clustering centroids.Compared with phishing web image segmentation based on K-means,experimental results show that the segmentation performance of our method gains much improvement.Moreover,our method can convergence to global optimal solution and is better in accuracy of phishing web segmentation.

  11. Clustering of Web Learners Based on Rough Set

    Institute of Scientific and Technical Information of China (English)

    LIU Shuai-dong; CHEN Shi-hong

    2004-01-01

    The demand for individualized teaching from E-learning websites is rapidly increasing due to the huge differences existed among Web learners.A method for clustering Web learners based on rough set is proposed.The basic idea of the method is to reduce the learning attributes prior to clustering, and therefore the clustering of Web learners is carried out in a relative low-dimensional space.Using this method, the E-learning websites can arrange corresponding teaching content for different clusters of learners so that the learners' individual requirements can be more satisfied.

  12. 一种基于词聚类的文本特征描述方法%A Description Method of Text Feature Based on Word Clustering

    Institute of Scientific and Technical Information of China (English)

    陈炯; 张永奎

    2011-01-01

    针对文本挖掘中存在的特征空间高维性问题,提出了一种基于词聚类的文本特征描述方法,旨在通过机器学习的方法挖掘词汇之间的语义关联,动态构造特定领域的概念词典,借助构造的概念来描述文本的特征,该方法不借助主题词典,先从训练语料中对词的共现情况进行分析,用词聚类(word clustering)生成由种子词(seedwords)表示的代表某一主题概念的词类,然后用种子词作为文本的特征项.实验表明,该方法不仅压缩了特征空间的维数,也克服了HowNet中概念信息的局限性,提高了文本分类的精确度.%Feature space has the high-dimensional problem in text mining. This paper presented a new description method of text feature based on word clustering. The purpose is to mine semantic association between words using machine learning, then to construct the concept dictionary in specific areas dynamically, finally to describe the text feature with the concept constructed. This method analyzes the co-occurrence of words in training corpus firstly, without using theme dictionary, then generates word cluster expressed in seed words which represents a concept of theme by word clustering, finally takes the seed words as text features. The experimental results indicate that this method not only reduces dimensionality of feature space but also overcomes the limitations of the concept in HowNet, and improve the performance of text categorization.

  13. New clustering methods for population comparison on paternal lineages.

    Science.gov (United States)

    Juhász, Z; Fehér, T; Bárány, G; Zalán, A; Németh, E; Pádár, Z; Pamjav, H

    2015-04-01

    The goal of this study is to show two new clustering and visualising techniques developed to find the most typical clusters of 18-dimensional Y chromosomal haplogroup frequency distributions of 90 Western Eurasian populations. The first technique called "self-organizing cloud (SOC)" is a vector-based self-learning method derived from the Self Organising Map and non-metric Multidimensional Scaling algorithms. The second technique is a new probabilistic method called the "maximal relation probability" (MRP) algorithm, based on a probability function having its local maximal values just in the condensation centres of the input data. This function is calculated immediately from the distance matrix of the data and can be interpreted as the probability that a given element of the database has a real genetic relation with at least one of the remaining elements. We tested these two new methods by comparing their results to both each other and the k-medoids algorithm. By means of these new algorithms, we determined 10 clusters of populations based on the similarity of haplogroup composition. The results obtained represented a genetically, geographically and historically well-interpretable picture of 10 genetic clusters of populations mirroring the early spread of populations from the Fertile Crescent to the Caucasus, Central Asia, Arabia and Southeast Europe. The results show that a parallel clustering of populations using SOC and MRP methods can be an efficient tool for studying the demographic history of populations sharing common genetic footprints.

  14. Generating a multilingual taxonomy based on multilingual terminology clustering

    Institute of Scientific and Technical Information of China (English)

    Chengzhi; ZHANG

    2011-01-01

    Taxonomy denotes the hierarchical structure of a knowledge organization system.It has important applications in knowledge navigation,semantic annotation and semantic search.It is a useful instrument to study the multilingual taxonomy generated automatically under the dynamic information environment in which massive amounts of information are processed and found.Multilingual taxonomy is the core component of the multilingual thesaurus or ontology.This paper presents two methods of bilingual generated taxonomy:Cross-language terminology clustering and mixed-language based terminology clustering.According to our experimental results of terminology clustering related to four specific subject domains,we found that if the parallel corpus is used to cluster multilingual terminologies,the method of using mixed-language based terminology clustering outperforms that of using the cross-language terminology clustering.

  15. Mapping Cigarettes Similarities using Cluster Analysis Methods

    Directory of Open Access Journals (Sweden)

    Lorentz Jäntschi

    2007-09-01

    Full Text Available The aim of the research was to investigate the relationship and/or occurrences in and between chemical composition information (tar, nicotine, carbon monoxide, market information (brand, manufacturer, price, and public health information (class, health warning as well as clustering of a sample of cigarette data. A number of thirty cigarette brands have been analyzed. Six categorical (cigarette brand, manufacturer, health warnings, class and four continuous (tar, nicotine, carbon monoxide concentrations and package price variables were collected for investigation of chemical composition, market information and public health information. Multiple linear regression and two clusterization techniques have been applied. The study revealed interesting remarks. The carbon monoxide concentration proved to be linked with tar and nicotine concentration. The applied clusterization methods identified groups of cigarette brands that shown similar characteristics. The tar and carbon monoxide concentrations were the main criteria used in clusterization. An analysis of a largest sample could reveal more relevant and useful information regarding the similarities between cigarette brands.

  16. Comparing the performance of biomedical clustering methods

    DEFF Research Database (Denmark)

    Wiwie, Christian; Baumbach, Jan; Röttger, Richard

    2015-01-01

    Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene......-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art....

  17. FLCW: Frequent Itemset Based Text Clustering with Window Constraint

    Institute of Scientific and Technical Information of China (English)

    ZHOU Chong; LU Yansheng; ZOU Lei; HU Rong

    2006-01-01

    Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information.There is some important semantic information existed in the positions of words in the sequence.In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint.The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency.

  18. A Practical Optimisation Method to Improve QOS and GOS-Based Key Performance Indicators in GSM Network Cell Cluster Environment [

    Directory of Open Access Journals (Sweden)

    Joseph Isabona

    2014-11-01

    Full Text Available The delivering of both good quality of service (QoS and Grade of Service (GoS in any competitive mob ile communication environment is a major factor to redu cing subscribers’ churn rate. Therefore, it is important for wireless mobile network operators to ensure stability and efficiency by delivering a consistent, reliable and high-quality end user (sub scriber satisfaction. This can only be achieve by conducting a regular network performance monitoring and optimisation as it directly impacts the qualit y of the offered services and hence user satisfaction. I n this paper, we present the results of network performance evaluation and optimisation of a GSM ne twork on cell cluster-basis, in Asaba region, South East Nigeria. We employ a combination of essential key performance indicators such as dropped call rat e, call setup success rate and outage call rate to exa mine overall QoS and GoS performance of the GSM network. Our results after network optimisation sho wed significant performance improvement in terms of call drop rate, call set up success rate, and call block rate across. Specifically, the end user satis faction rate has increased from 94.45%, 87.74%, and 92.85% to 99.05%, 95.38% and 99.03% respectively across the three GSM cell clusters. The GoS is reduced fro m 3.33%, 6.60% and 2.38% to 0.00%, 3.70% and 0.00% respectively. Furthermore, ESA, which corresp ond end points service availability, has improved from 94.44%, 93.40% and 97.62% to 100%, 96.30% and 100% respectively. In addition, the average throughput has improved from 73.74kbits/s, 85.06kbi ts/s and 87.54kbits/s to 77.07kbits/s, 92.38kbits/s and 102kbits/s respectively across the three GSM cell c lusters.

  19. An infared polarization image fusion method based on NSCT and fuzzy C-means clustering segmentation algorithms

    Science.gov (United States)

    Yu, Xuelian; Chen, Qian; Gu, Guohua; Qian, Weixian; Xu, Mengxi

    2014-11-01

    The integration between polarization and intensity images possessing complementary and discriminative information has emerged as a new and important research area. On the basis of the consideration that the resulting image has different clarity and layering requirement for the target and background, we propose a novel fusion method based on non-subsampled Contourlet transform (NSCT) and fuzzy C-means (FCM) segmentation for IR polarization and light intensity images. First, the polarization characteristic image is derived from fusion of the degree of polarization (DOP) and the angle of polarization (AOP) images using local standard variation and abrupt change degree (ACD) combined criteria. Then, the polarization characteristic image is segmented with FCM algorithm. Meanwhile, the two source images are respectively decomposed by NSCT. The regional energy-weighted and similarity measure are adopted to combine the low-frequency sub-band coefficients of the object. The high-frequency sub-band coefficients of the object boundaries are integrated through the maximum selection rule. In addition, the high-frequency sub-band coefficients of internal objects are integrated by utilizing local variation, matching measure and region feature weighting. The weighted average and maximum rules are employed independently in fusing the low-frequency and high-frequency components of the background. Finally, an inverse NSCT operation is accomplished and the final fused image is obtained. The experimental results illustrate that the proposed IR polarization image fusion algorithm can yield an improved performance in terms of the contrast between artificial target and cluttered background and a more detailed representation of the depicted scene.

  20. Weighted Clustering Based Preemptive Scheduling For Real Time System

    Directory of Open Access Journals (Sweden)

    H.S Behera

    2012-05-01

    Full Text Available In this paper a new improved clustering based scheduling algorithm for a single processor environment is proposed. In the proposed method, processes are organized into non-overlapping clusters.For each process the variance from the median, is calculated and compared with the variance from the means of other clusters. Each process is assigned to the cluster associated with the closest median. The new median of each cluster is calculated and the procedure is repeated until the medians are fixed. Weight is assigned to each cluster using the externally assigned priorities and the burst time. The cluster with highest weight is executed first and jobs are scheduled using the Round Robin algorithm with calculated dynamic time slice.. The experimental study of the proposed scheduling algorithm shows that the high priority jobs can be executed first to meet the deadlines and also prevents starvation of processes at the same time which is crucial in a real time system.

  1. ROUGH SET BASED CLUSTERING OF GENE EXPRESSION DATA: A SURVEY

    Directory of Open Access Journals (Sweden)

    J.JEBA EMILYN

    2010-12-01

    Full Text Available Microarray technology has now made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. But the high dimensionality property of gene expression data makes it difficult to be analyzed. Lot of clustering algorithms are available for clustering. In this paper we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then we introduce rough clustering and itsadvantage over strict and fuzzy clustering is explored. We also explain why rough clustering is preferred over other conventional methods by presenting a survey on few clustering algorithms based on rough set theory for gene expression data. We conclude by stating that this area proves to be potential research field for the researchcommunity.

  2. Recent advances in coupled-cluster methods

    CERN Document Server

    Bartlett, Rodney J

    1997-01-01

    Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities

  3. Fuzzy Clustering - Principles, Methods and Examples

    DEFF Research Database (Denmark)

    Kroszynski, Uri; Zhou, Jianjun

    1998-01-01

    One of the most remarkable advances in the field of identification and control of systems -in particular mechanical systems- whose behaviour can not be described by means of the usual mathematical models, has been achieved by the application of methods of fuzzy theory.In the framework of a study...... about identification of "black-box" properties by analysis of system input/output data sets, we have prepared an introductory note on the principles and the most popular data classification methods used in fuzzy modeling. This introductory note also includes some examples that illustrate the use...... of the methods. The examples were solved by hand and served as a test bench for exploration of the MATLAB capabilities included in the Fuzzy Control Toolbox. The fuzzy clustering methods described include Fuzzy c-means (FCM), Fuzzy c-lines (FCL) and Fuzzy c-elliptotypes (FCE)....

  4. Structure based alignment and clustering of proteins (STRALCP)

    Science.gov (United States)

    Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

    2013-06-18

    Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.

  5. Eros-based Fuzzy Cluster Method for Longitudual Data%基于Eros距离的纵向数据模糊聚类方法

    Institute of Scientific and Technical Information of China (English)

    李会民; 闫健卓; 方丽英; 王普

    2013-01-01

    Considering the characteristics of longitudinal data set,such as multi-variates,missing data,unequal series length,and irregular time interval,an algorithm based on Eros distance similarity measure for longitudinal data is proposed.Eros distance is used in Fuzzy-C-Means cluster processing.First,preprocessing is done for unbalance longitudinal data set,which includes filling the missing data,reducing the randaut attributes,etc.Second,FErosCM Cluster method is used for claasification automatically,and takes into account information entropy for assessing the performance of cluster algorithm.Experiments show that this method is effective and efficient for longitudinal data classification.%针对纵向数据集的数据特征,如多维、含缺失值、序列不等间隔和不全等长等特点,研究一种基于Eros距离的纵向数据的相似性度量方法,并对模糊C均值聚类算法进行改进,提出一种基于Eros距离度量的模糊聚类数据处理方法.对于纵向数据集,首先进行缺失值填充、变量标准化等预处理,使用粗糙集理论对冗余属性进行约简,然后基于FErosCM聚类方法进行数据自动分类.对比实验证实此方法可用于纵向数据集的自动聚类处理,并使用信息熵作为聚类效果的评价手段.实验结果表明:无论在聚类效率还是准确度上,FErosCM方法对于纵向数据的分类处理均是有效可行的.

  6. Dictionary-Based, Clustered Sparse Representation for Hyperspectral Image Classification

    Directory of Open Access Journals (Sweden)

    Zhen-tao Qin

    2015-01-01

    Full Text Available This paper presents a new, dictionary-based method for hyperspectral image classification, which incorporates both spectral and contextual characteristics of a sample clustered to obtain a dictionary of each pixel. The resulting pixels display a common sparsity pattern in identical clustered groups. We calculated the image’s sparse coefficients using the dictionary approach, which generated the sparse representation features of the remote sensing images. The sparse coefficients are then used to classify the hyperspectral images via a linear SVM. Experiments show that our proposed method of dictionary-based, clustered sparse coefficients can create better representations of hyperspectral images, with a greater overall accuracy and a Kappa coefficient.

  7. Clustering-Based PU Active Text Classification Method%一种基于聚类的PU主动文本分类方法

    Institute of Scientific and Technical Information of China (English)

    刘露; 彭涛; 左万利; 戴耀康

    2013-01-01

    文本分类是信息检索的关键问题之一。提取更多的可信反例和构造准确高效的分类器是 PU(positive and unlabeled)文本分类的两个重要问题。然而,在现有的可信反例提取方法中,很多方法提取的可信反例数量较少,构建的分类器质量有待提高。分别针对这两个重要步骤提供了一种基于聚类的半监督主动分类方法。与传统的反例提取方法不同,利用聚类技术和正例文档应与反例文档共享尽可能少的特征项这一特点,从未标识数据集中尽可能多地移除正例,从而可以获得更多的可信反例。结合 SVM 主动学习和改进的 Rocchio 构建分类器,并采用改进的TFIDF(term frequency inverse document frequency)进行特征提取,可以显著提高分类的准确度。分别在3个不同的数据集中测试了分类结果(RCV1,Reuters-21578,20 Newsgoups)。实验结果表明,基于聚类寻找可信反例可以在保持较低错误率的情况下获取更多的可信反例,而且主动学习方法的引入也显著提升了分类精度。%Text classification is a key technology in information retrieval. Collecting more reliable negative examples, and building effective and efficient classifiers are two important problems for automatic text classification. However, the existing methods mostly collect a small number of reliable negative examples, keeping the classifiers from reaching high accuracy. In this paper, a clustering-based method for automatic PU (positive and unlabeled) text classification enhanced by SVM active learning is proposed. In contrast to traditional methods, this approach is based on the clustering technique which employs the characteristic that positive and negative examples should share as few words as possible. It finds more reliable negative examples by removing as many probable positive examples from unlabeled set as possible. In the process of building classifier, a term weighting scheme TFIPNDF (term

  8. Summarization and Matching of Density-Based Clusters in Streaming Environments

    CERN Document Server

    Yang, Di; Ward, Matthew O

    2011-01-01

    Density-based cluster mining is known to serve a broad range of applications ranging from stock trade analysis to moving object monitoring. Although methods for efficient extraction of density-based clusters have been studied in the literature, the problem of summarizing and matching of such clusters with arbitrary shapes and complex cluster structures remains unsolved. Therefore, the goal of our work is to extend the state-of-art of density-based cluster mining in streams from cluster extraction only to now also support analysis and management of the extracted clusters. Our work solves three major technical challenges. First, we propose a novel multi-resolution cluster summarization method, called Skeletal Grid Summarization (SGS), which captures the key features of density-based clusters, covering both their external shape and internal cluster structures. Second, in order to summarize the extracted clusters in real-time, we present an integrated computation strategy C-SGS, which piggybacks the generation of...

  9. Document Clustering based on Topic Maps

    CERN Document Server

    Rafi, Muhammad; Farooq, Amir; 10.5120/1640-2204

    2011-01-01

    Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algori...

  10. Select and Cluster: A Method for Finding Functional Networks of Clustered Voxels in fMRI

    Science.gov (United States)

    DonGiovanni, Danilo

    2016-01-01

    Extracting functional connectivity patterns among cortical regions in fMRI datasets is a challenge stimulating the development of effective data-driven or model based techniques. Here, we present a novel data-driven method for the extraction of significantly connected functional ROIs directly from the preprocessed fMRI data without relying on a priori knowledge of the expected activations. This method finds spatially compact groups of voxels which show a homogeneous pattern of significant connectivity with other regions in the brain. The method, called Select and Cluster (S&C), consists of two steps: first, a dimensionality reduction step based on a blind multiresolution pairwise correlation by which the subset of all cortical voxels with significant mutual correlation is selected and the second step in which the selected voxels are grouped into spatially compact and functionally homogeneous ROIs by means of a Support Vector Clustering (SVC) algorithm. The S&C method is described in detail. Its performance assessed on simulated and experimental fMRI data is compared to other methods commonly used in functional connectivity analyses, such as Independent Component Analysis (ICA) or clustering. S&C method simplifies the extraction of functional networks in fMRI by identifying automatically spatially compact groups of voxels (ROIs) involved in whole brain scale activation networks. PMID:27656202

  11. An Efficient Fuzzy Clustering-Based Approach for Intrusion Detection

    CERN Document Server

    Nguyen, Huu Hoa; Darmont, Jérôme

    2011-01-01

    The need to increase accuracy in detecting sophisticated cyber attacks poses a great challenge not only to the research community but also to corporations. So far, many approaches have been proposed to cope with this threat. Among them, data mining has brought on remarkable contributions to the intrusion detection problem. However, the generalization ability of data mining-based methods remains limited, and hence detecting sophisticated attacks remains a tough task. In this thread, we present a novel method based on both clustering and classification for developing an efficient intrusion detection system (IDS). The key idea is to take useful information exploited from fuzzy clustering into account for the process of building an IDS. To this aim, we first present cornerstones to construct additional cluster features for a training set. Then, we come up with an algorithm to generate an IDS based on such cluster features and the original input features. Finally, we experimentally prove that our method outperform...

  12. MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS

    Directory of Open Access Journals (Sweden)

    Jana Halčinová

    2014-06-01

    Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.

  13. Bases for cluster algebras from surfaces

    CERN Document Server

    Musiker, Gregg; Williams, Lauren

    2011-01-01

    We construct two bases for each cluster algebra coming from a triangulated surface without punctures. We work in the context of a coefficient system coming from a full-rank exchange matrix, for example, principal coefficients.

  14. Fuzzy Clustering Using C-Means Method

    Directory of Open Access Journals (Sweden)

    Georgi Krastev

    2015-05-01

    Full Text Available The cluster analysis of fuzzy clustering according to the fuzzy c-means algorithm has been described in this paper: the problem about the fuzzy clustering has been discussed and the general formal concept of the problem of the fuzzy clustering analysis has been presented. The formulation of the problem has been specified and the algorithm for solving it has been described.

  15. Clustering Algorithm for Unsupervised Monaural Musical Sound Separation Based on Non-negative Matrix Factorization

    Science.gov (United States)

    Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo

    Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.

  16. The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis

    Directory of Open Access Journals (Sweden)

    Chen Yidong

    2004-01-01

    Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.

  17. Scene Text Extraction Method Based on Clustering and MRF Model%基于聚类和MRF模型的场景文字提取方法

    Institute of Scientific and Technical Information of China (English)

    章天则; 赵宇明

    2011-01-01

    提出一种从自然场景中提取文本区域的方法.该方法包括候选文本区域的提取,以及候选区域是否为文字区域的判定.候选文字区域的提取,主要利用图像的纹理特征和HSL颜色空间信息,通过改进的模糊C均值聚类函数,结合拉普拉斯掩膜与计算最大梯度差来实现.由连通域边缘密度信息、形状信息的马尔科夫随机场模型,判定候选文字区域是否为文字区域.经ICDAR2003数据库测试结果表明,该方法具有较高的精确度.%This paper proposes a method for extracting text regions from natural scene images. This method includes two parts, text region candidates extraction and candidate regions further classification of text region or non-text region. The text region candidates are extracted through a modified fuzzy C-means clustering algorithm combined with Laplacian mask and maximum gradient difference value, which involves texture features and HSL color space information. The candidate regions are checked by edge density information and shape information of the connected components based on Markov Random Field(MRF) model. The proposed method achieves reasonable accuracy for text extraction from examples of the ICDAR 2003 database.

  18. Optimal Hops-Based Adaptive Clustering Algorithm

    Science.gov (United States)

    Xuan, Xin; Chen, Jian; Zhen, Shanshan; Kuo, Yonghong

    This paper proposes an optimal hops-based adaptive clustering algorithm (OHACA). The algorithm sets an energy selection threshold before the cluster forms so that the nodes with less energy are more likely to go to sleep immediately. In setup phase, OHACA introduces an adaptive mechanism to adjust cluster head and load balance. And the optimal distance theory is applied to discover the practical optimal routing path to minimize the total energy for transmission. Simulation results show that OHACA prolongs the life of network, improves utilizing rate and transmits more data because of energy balance.

  19. Malware Classification based on Call Graph Clustering

    CERN Document Server

    Kinable, Joris

    2010-01-01

    Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DB...

  20. Unbiased methods for removing systematics from galaxy clustering measurements

    CERN Document Server

    Elsner, Franz; Peiris, Hiranya V

    2015-01-01

    Measuring the angular clustering of galaxies as a function of redshift is a powerful method for tracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterise and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to th...

  1. Method for spectral co-clustering documents and words based on morphology%基于形态学的单词一文档谱聚类方法

    Institute of Scientific and Technical Information of China (English)

    刘娜; 肖智博; 鲁明羽

    2012-01-01

    本文利用形态学的方法确定聚类数目,并对单词一文档谱聚类方法进行改进.确定聚类数目主要分三个步骤:第一步将单词一文档谱聚类方法中产生的矩阵转换成可视化聚类趋势分析方法(visual assessment of tendency,VAT)灰度图,第二步利用灰度形态学、图像二值化、距离转换等图像处理技术过滤产生的VAT灰度图,第三步对过滤后的VAT灰度图建立信号图,并进行平滑处理,通过平滑后的信号图的波峰波谷数目确定文档集的聚类数目.实验表明,该方法能够提高单词一文档谱聚类方法的聚类效果.%One of the major problems in spectral co-clustering analysis is the determination of the number of clusters in datasets, which is a basic input for most spectral co-clustering algorithms. In this paper, we propose a new method for automatically estimating the number of clusters in datasets and modify spectral co-clustering documents and words, which is based on an existing algorithm for visual assessment of tendency (VAT) of a data set, using several common image and signal processing techniques. The method determining the number of clusters includes three main steps. First, the input matrix generated by spectral co-clustering documents and words is created into reordered dissimilarity gray image, from the image it is better able to highlight the potential cluster structure in dataset. We generate gray image use the VAT algorithm. Then, sequential image processing operations are used to segment the regions of interest in the gray image and to convert the filtered image into a distance- transformed image. These processing operations consist of gray morphology, image binarization, distance transform. Finally, we project the transformed image onto the diagonal axis, which yields a one dimensional signal,from which we can extract the number of clusters by major peaks and valleys after smoothing signal. When the number of

  2. Integrated management of thesis using clustering method

    Science.gov (United States)

    Astuti, Indah Fitri; Cahyadi, Dedy

    2017-02-01

    Thesis is one of major requirements for student in pursuing their bachelor degree. In fact, finishing the thesis involves a long process including consultation, writing manuscript, conducting the chosen method, seminar scheduling, searching for references, and appraisal process by the board of mentors and examiners. Unfortunately, most of students find it hard to match all the lecturers' free time to sit together in a seminar room in order to examine the thesis. Therefore, seminar scheduling process should be on the top of priority to be solved. Manual mechanism for this task no longer fulfills the need. People in campus including students, staffs, and lecturers demand a system in which all the stakeholders can interact each other and manage the thesis process without conflicting their timetable. A branch of computer science named Management Information System (MIS) could be a breakthrough in dealing with thesis management. This research conduct a method called clustering to distinguish certain categories using mathematics formulas. A system then be developed along with the method to create a well-managed tool in providing some main facilities such as seminar scheduling, consultation and review process, thesis approval, assessment process, and also a reliable database of thesis. The database plays an important role in present and future purposes.

  3. Discrete range clustering using Monte Carlo methods

    Science.gov (United States)

    Chatterji, G. B.; Sridhar, B.

    1993-01-01

    For automatic obstacle avoidance guidance during rotorcraft low altitude flight, a reliable model of the nearby environment is needed. Such a model may be constructed by applying surface fitting techniques to the dense range map obtained by active sensing using radars. However, for covertness, passive sensing techniques using electro-optic sensors are desirable. As opposed to the dense range map obtained via active sensing, passive sensing algorithms produce reliable range at sparse locations, and therefore, surface fitting techniques to fill the gaps in the range measurement are not directly applicable. Both for automatic guidance and as a display for aiding the pilot, these discrete ranges need to be grouped into sets which correspond to objects in the nearby environment. The focus of this paper is on using Monte Carlo methods for clustering range points into meaningful groups. One of the aims of the paper is to explore whether simulated annealing methods offer significant advantage over the basic Monte Carlo method for this class of problems. We compare three different approaches and present application results of these algorithms to a laboratory image sequence and a helicopter flight sequence.

  4. A Clustering Ensemble approach based on the similarities in 2-mode social networks

    Institute of Scientific and Technical Information of China (English)

    SU Bao-ping; ZHANG Meng-jie

    2014-01-01

    For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A new clustering ensemble approach based on the similarities in 2-mode networks is proposed in this paper. First of all, the data object and the initial clustering clusters transform into 2-mode networks, then using the similarities in 2-mode networks to calculate the similarity between different clusters iteratively to refine the adjacency matrix , K-means algorithm is finally used to get the final clustering, then obtain the final clustering results.The method effectively use the similarity between different clusters, example shows the feasibility of this method.

  5. Analysis of Massive Emigration from Poland: The Model-Based Clustering Approach

    Science.gov (United States)

    Witek, Ewa

    The model-based approach assumes that data is generated by a finite mixture of probability distributions such as multivariate normal distributions. In finite mixture models, each component of probability distribution corresponds to a cluster. The problem of determining the number of clusters and choosing an appropriate clustering method becomes the problem of statistical model choice. Hence, the model-based approach provides a key advantage over heuristic clustering algorithms, because it selects both the correct model and the number of clusters.

  6. Cluster-based control of nonlinear dynamics

    CERN Document Server

    Kaiser, Eurika; Spohn, Andreas; Cattafesta, Louis N; Morzynski, Marek

    2016-01-01

    The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. Here, a cluster-based control framework is proposed to determine optimal control laws with respect to a cost function for unsteady flows. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a Markov model. The Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is de...

  7. Comparative study between the proposed shape independent clustering method and the conventional methods (K-means and the other

    Directory of Open Access Journals (Sweden)

    Kohei Arai

    2013-07-01

    Full Text Available Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in the data sets. In this paper, we propose to provide a consistent partitioning of a dataset which allows identifying any shape of cluster patterns in case of numerical clustering, convex or non-convex. The method is based on layered structure representation that be obtained from measurement distance and angle of numerical data to the centroid data and based on the iterative clustering construction utilizing a nearest neighbor distance between clusters to merge. Encourage result show the effectiveness of the proposed technique.

  8. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus; Schweder, Tore

    2006-01-01

    The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...... is implemented using markov chain Monte Carlo (MCMC) methods to obtain efficient estimates of spatial clustering parameters. Uncertainty is addressed using parametric bootstrap or by consideration of posterior distributions in a Bayesian setting. Maximum likelihood estimation and Bayesian inference are compared...

  9. Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis

    Science.gov (United States)

    Fu, Pei-hua; Yin, Hong-bo

    In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.

  10. Cluster Ensemble-based Image Segmentation

    Directory of Open Access Journals (Sweden)

    Xiaoru Wang

    2013-07-01

    Full Text Available Image segmentation is the foundation of computer vision applications. In this paper, we propose a new\tcluster ensemble-based image\tsegmentation algorithm, which overcomes several problems of traditional methods. We make two main contributions in this paper. First, we introduce the cluster ensemble concept to fuse the segmentation results from different types of visual features effectively, which can deliver a better final result and achieve a much more stable performance for broad categories of images. Second, we exploit the PageRank idea from Internet applications and apply it to the image segmentation task. This can improve the final segmentation results by combining the spatial information of the image and the semantic similarity of regions. Our experiments on four public image databases validate the superiority of our algorithm over conventional single type of feature or multiple types of features-based algorithms, since our algorithm can fuse multiple types of features effectively for better segmentation results. Moreover, our method is also proved to be very competitive in comparison with other state-of-the-art segmentation algorithms.

  11. Model-based clustering in networks with Stochastic Community Finding

    CERN Document Server

    McDaid, Aaron F; Friel, Nial; Hurley, Neil J

    2012-01-01

    In the model-based clustering of networks, blockmodelling may be used to identify roles in the network. We identify a special case of the Stochastic Block Model (SBM) where we constrain the cluster-cluster interactions such that the density inside the clusters of nodes is expected to be greater than the density between clusters. This corresponds to the intuition behind community-finding methods, where nodes tend to clustered together if they link to each other. We call this model Stochastic Community Finding (SCF) and present an efficient MCMC algorithm which can cluster the nodes, given the network. The algorithm is evaluated on synthetic data and is applied to a social network of interactions at a karate club and at a monastery, demonstrating how the SCF finds the 'ground truth' clustering where sometimes the SBM does not. The SCF is only one possible form of constraint or specialization that may be applied to the SBM. In a more supervised context, it may be appropriate to use other specializations to guide...

  12. Ontology-based topic clustering for online discussion data

    Science.gov (United States)

    Wang, Yongheng; Cao, Kening; Zhang, Xiaoming

    2013-03-01

    With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.

  13. Clustering Based Approximation in Facial Image Retrieval

    Directory of Open Access Journals (Sweden)

    R.Pitchaiah

    2016-11-01

    Full Text Available The web search tool returns a great many pictures positioned by the essential words separated from the encompassing content. Existing article acknowledgment systems to prepare characterization models from human-named preparing pictures or endeavor to deduce the connection/probabilities in the middle of pictures and commented magic words. Albeit proficient in supporting in mining comparatively looking facial picture results utilizing feebly named ones, the learning phase of above bunch based close estimations is shortened with idleness elements for ongoing usage which is fundamentally highlighted in our showings. So we propose to utilize shading based division driven auto face location methodology combined with an adjusted Clustering Based Approximation (CBA plan to decrease the dormancy but then holding same proficiency amid questioning. The specialized phases of our proposed drew closer is highlighted in the accompanying stream diagram. Every phase of the above specialized procedure guarantees the question results at tremendously lessened handling time in this way making our method much achievable for ongoing usage

  14. Time-dependent coupled-cluster method for atomic nuclei

    CERN Document Server

    Pigg, D A; Nam, H; Papenbrock, T

    2012-01-01

    We study time-dependent coupled-cluster theory in the framework of nuclear physics. Based on Kvaal's bi-variational formulation of this method [S. Kvaal, arXiv:1201.5548], we explicitly demonstrate that observables that commute with the Hamiltonian are conserved under time evolution. We explore the role of the energy and of the similarity-transformed Hamiltonian under real and imaginary time evolution and relate the latter to similarity renormalization group transformations. Proof-of-principle computations of He-4 and O-16 in small model spaces, and computations of the Lipkin model illustrate the capabilities of the method.

  15. 基于AP密度聚类方法的雷达辐射源信号识别%Signal Identification of Radar Radiation Source Based on AP Density Clustering Method

    Institute of Scientific and Technical Information of China (English)

    王美玲; 张复春; 杨承志

    2012-01-01

    Signal identification of unknown radar radiation source is always a problem of intelligence analysis of radar countermeasure. Aiming at the shortage that the identification probability is low when the clustering algorithm based on density is used to process non-uniformity samples,this paper combines the algorithm with affinity propagation (AP) clustering algorithm,brings forward an identification method based on AP density clustering method. The method firstly uses AP clustering method to perform the primary clustering to the data samples, then sets up the correlative parameters,uses the algorithm of density based spatial clustering of application with noise (DB- SCAN) to perform secondary clustering. Comparing with original samples,the distribution of primary clustering results is representative and the parameter values adapted for DBSCAN algorithm can be found easily. The method is verified to have better identification probability through the test.%未知雷达辐射源信号识别一直是雷达对抗情报分析中的难题。针对基于密度的聚类算法在处理不均匀样本时识别率较低的缺陷,将该算法与亲和传递(AP)聚类算法结合,提出一种基于AP密度聚类的识别方法。该方法先利用AP聚类方法对数据样本进行初步聚类,再设定相关参数,运用基于密度的带有噪声的空间聚类(DBSCAN)算法进行二次聚类。相对于原样本,初始聚类结果分布具有一定的代表性,容易找到适合DBSCAN方法的参数值。测试表明该方法具有较高的识别率。

  16. Research of Web Documents Clustering Based on Dynamic Concept

    Institute of Scientific and Technical Information of China (English)

    WANG Yun-hua; CHEN Shi-hong

    2004-01-01

    Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information, this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents, and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy.

  17. Graph-based clustering and data visualization algorithms

    CERN Document Server

    Vathy-Fogarassy, Ágnes

    2013-01-01

    This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on

  18. Clustering-Based Matrix Factorization

    OpenAIRE

    Mirbakhsh, Nima; Ling, Charles X.

    2013-01-01

    Recommender systems are emerging technologies that nowadays can be found in many applications such as Amazon, Netflix, and so on. These systems help users to find relevant information, recommendations, and their preferred items. Slightly improvement of the accuracy of these recommenders can highly affect the quality of recommendations. Matrix Factorization is a popular method in Recommendation Systems showing promising results in accuracy and complexity. In this paper we propose an extension ...

  19. Face Detection Method Based on Semi-supervised Clustering%基于半监督聚类的人脸检测方法

    Institute of Scientific and Technical Information of China (English)

    王燕; 蒋正午

    2012-01-01

    The paper proposes a method of face detection combined color of skin with continuous AdaBoost algorithm. In order to establish skin color model, this paper takes advantage of semi-supervised strategy to guide skin color clustering, and it also proposes a new algorithm SKDK in the process of clustering, skin color model can be established by the probability statistics distribution characteristics of each pixel cluster. On this basis, mathematical morphology of knowledge is used to handle image and find face candidate, which is the input of continuous AdaBoost classifier for final face detection. Experimental results prove that face detection ability of the method is superior to that directly using continuous AdaBoost method for face detection especially in multi-face situation.%将肤色与连续AdaBoost算法相结合进行人脸检测,并引入半监督策略指导肤色聚类从而建立肤色模型.在肤色聚类过程中,提出一种基于半监督的SKDK算法引导肤色聚类,依据各个像素簇的概率统计分布特性得到肤色模型.在此基础上利用数学形态学等知识对图像进行处理,得到人脸候选区域,将其作为连续AdaBoost分类器的输入进行人脸检测.实验结果表明,在多人脸的场景下,该方法的检测效果优于直接使用连续AdaBoost方法进行人脸检测的检测效果.

  20. An analytic method to compute star cluster luminosity statistics

    Science.gov (United States)

    da Silva, Robert L.; Krumholz, Mark R.; Fumagalli, Michele; Fall, S. Michael

    2014-03-01

    The luminosity distribution of the brightest star clusters in a population of galaxies encodes critical pieces of information about how clusters form, evolve and disperse, and whether and how these processes depend on the large-scale galactic environment. However, extracting constraints on models from these data is challenging, in part because comparisons between theory and observation have traditionally required computationally intensive Monte Carlo methods to generate mock data that can be compared to observations. We introduce a new method that circumvents this limitation by allowing analytic computation of cluster order statistics, i.e. the luminosity distribution of the Nth most luminous cluster in a population. Our method is flexible and requires few assumptions, allowing for parametrized variations in the initial cluster mass function and its upper and lower cutoffs, variations in the cluster age distribution, stellar evolution and dust extinction, as well as observational uncertainties in both the properties of star clusters and their underlying host galaxies. The method is fast enough to make it feasible for the first time to use Markov chain Monte Carlo methods to search parameter space to find best-fitting values for the parameters describing cluster formation and disruption, and to obtain rigorous confidence intervals on the inferred values. We implement our method in a software package called the Cluster Luminosity Order-Statistic Code, which we have made publicly available.

  1. A Cluster- Based Secure Active Network Environment

    Institute of Scientific and Technical Information of China (English)

    CHEN Xiao-lin; ZHOU Jing-yang; DAI Han; LU Sang-lu; CHEN Gui-hai

    2005-01-01

    We introduce a cluster-based secure active network environment (CSANE) which separates the processing of IP packets from that of active packets in active routers. In this environment, the active code authorized or trusted by privileged users is executed in the secure execution environment (EE) of the active router, while others are executed in the secure EE of the nodes in the distributed shared memory (DSM) cluster. With the supports of a multi-process Java virtual machine and KeyNote, untrusted active packets are controlled to securely consume resource. The DSM consistency management makes that active packets can be parallelly processed in the DSM cluster as if they were processed one by one in ANTS (Active Network Transport System). We demonstrate that CSANE has good security and scalability, but imposing little changes on traditional routers.

  2. ENERGY OPTIMIZATION IN CLUSTER BASED WIRELESS SENSOR NETWORKS

    Directory of Open Access Journals (Sweden)

    T. SHANKAR

    2014-04-01

    Full Text Available Wireless sensor networks (WSN are made up of sensor nodes which are usually battery-operated devices, and hence energy saving of sensor nodes is a major design issue. To prolong the networks lifetime, minimization of energy consumption should be implemented at all layers of the network protocol stack starting from the physical to the application layer including cross-layer optimization. Optimizing energy consumption is the main concern for designing and planning the operation of the WSN. Clustering technique is one of the methods utilized to extend lifetime of the network by applying data aggregation and balancing energy consumption among sensor nodes of the network. This paper proposed new version of Low Energy Adaptive Clustering Hierarchy (LEACH, protocols called Advanced Optimized Low Energy Adaptive Clustering Hierarchy (AOLEACH, Optimal Deterministic Low Energy Adaptive Clustering Hierarchy (ODLEACH, and Varying Probability Distance Low Energy Adaptive Clustering Hierarchy (VPDL combination with Shuffled Frog Leap Algorithm (SFLA that enables selecting best optimal adaptive cluster heads using improved threshold energy distribution compared to LEACH protocol and rotating cluster head position for uniform energy dissipation based on energy levels. The proposed algorithm optimizing the life time of the network by increasing the first node death (FND time and number of alive nodes, thereby increasing the life time of the network.

  3. Cancer detection based on Raman spectra super-paramagnetic clustering

    Science.gov (United States)

    González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual

    2016-08-01

    The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.

  4. Semantic Based Cluster Content Discovery in Description First Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    MUHAMMAD WASEEM KHAN

    2017-01-01

    Full Text Available In the field of data analytics grouping of like documents in textual data is a serious problem. A lot of work has been done in this field and many algorithms have purposed. One of them is a category of algorithms which firstly group the documents on the basis of similarity and then assign the meaningful labels to those groups. Description first clustering algorithm belong to the category in which the meaningful description is deduced first and then relevant documents are assigned to that description. LINGO (Label Induction Grouping Algorithm is the algorithm of description first clustering category which is used for the automatic grouping of documents obtained from search results. It uses LSI (Latent Semantic Indexing; an IR (Information Retrieval technique for induction of meaningful labels for clusters and VSM (Vector Space Model for cluster content discovery. In this paper we present the LINGO while it is using LSI during cluster label induction and cluster content discovery phase. Finally, we compare results obtained from the said algorithm while it uses VSM and Latent semantic analysis during cluster content discovery phase.

  5. C60-based clustering scheme for sensor management in STSS

    Institute of Scientific and Technical Information of China (English)

    Yiyu Zhou

    2015-01-01

    Clustering-based sensor-management schemes have been widely used for various wireless sensor networks (WSNs), as they are wel suited to the distributive and col aborative nature of WSN. In this paper, a C60-based clustering algorithm is proposed for the specific planned network of space tracking and surveil ance system (STSS), where al the sensors are partitioned into 12 clus-ters according to the C60 (or footbal surface) architecture, and then a hierarchical sensor-management scheme is wel designed. Final y, the algorithm is applied to a typical STSS constel ation, and the simulation results show that the proposed method has bet-ter target-tracking performance than the nonclustering scheduling method.

  6. Cluster based Intrusion Detection System for Manets

    Directory of Open Access Journals (Sweden)

    Nisha Dang

    2012-07-01

    Full Text Available Manets are the ad hoc networks that are build on demand or instantly when some mobile nodes come in the mobility range of each other and decide to cooperate for data transfer and communication. Therefore there is no defined topology for Manets. They communicate in dynamic topology which continuously changes as nodes are not stable. Due to this lack of infrastructure and distributed nature they are more vulnerable for attacks and provide a good scope to malicious users to become part of the network. To prevent the security of mobile ad hoc networks many security measures are designed such as encryption algorithms, firewalls etc. But still there is some scope of malicious actions. So, Intrusion detection systems are proposed to detect any intruder in the network and its malicious activities. Cluster based intrusion detection system is also designed to restrict the intruders activities in clusters of mobile nodes. In clusters each node run some intrusion detection code to detect local as well as global intrusion. In this paper we have taken insight of intrusion detection systems and different attacks on Manet security. Then we proposed how overhead involved in cluster based intrusion detection system can be reduced.

  7. Fast Affinity Propagation Clustering based on Machine Learning

    OpenAIRE

    Shailendra Kumar Shrivastava; J. L. Rana; DR.R.C.JAIN

    2013-01-01

    Affinity propagation (AP) was recently introduced as an un-supervised learning algorithm for exemplar based clustering. In this paper a novel Fast Affinity Propagation clustering Approach based on Machine Learning (FAPML) has been proposed. FAPML tries to put data points into clusters based on the history of the data points belonging to clusters in early stages. In FAPML we introduce affinity learning constant and dispersion constant which supervise the clustering process. FAPML also enforces...

  8. Association Rule Pruning based on Interestingness Measures with Clustering

    Directory of Open Access Journals (Sweden)

    R. Bhaskaran

    2009-11-01

    Full Text Available Association rule mining plays vital part in knowledge mining. The difficult task is discovering knowledge or useful rules from the large number of rules generated for reduced support. For pruning or grouping rules, several techniques are used such as rule structure cover methods, informative cover methods, rule clustering, etc. Another way of selecting association rules is based on interestingness measures such as support, confidence, correlation, and so on. In this paper, we study how rule clusters of the pattern Xi -> Y are distributed over different interestingness measures.

  9. Core Business Selection Based on Ant Colony Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Yu Lan

    2014-01-01

    Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.

  10. Sparse representation-based spectral clustering for SAR image segmentation

    Science.gov (United States)

    Zhang, Xiangrong; Wei, Zhengli; Feng, Jie; Jiao, Licheng

    2011-12-01

    A new method, sparse representation based spectral clustering (SC) with Nyström method, is proposed for synthetic aperture radar (SAR) image segmentation. Different from the conventional SC, this proposed technique is developed by using the sparse coefficients which obtained by solving l1 minimization problem to construct the affinity matrix and the Nyström method is applied to alleviate the segmentation process. The advantage of our proposed method is that we do not need to select the scaling parameter in the Gaussian kernel function artificially. We apply the proposed method, k-means and the classic spectral clustering algorithm with Nyström method to SAR image segmentation. The results show that compared with the other two methods, the proposed method can obtain much better segmentation results.

  11. Finding Within Cluster Dense Regions Using Distance Based Technique

    Directory of Open Access Journals (Sweden)

    Wesam Ashour

    2012-03-01

    Full Text Available One of the main categories in Data Clustering is density based clustering. Density based clustering techniques like DBSCAN are attractive because they can find arbitrary shaped clusters along with noisy outlier. The main weakness of the traditional density based algorithms like DBSCAN is clustering the different density level data sets. DBSCAN calculations done according to given parameters applied to all points in a data set, while densities of the data set clusters may be totally different. The proposed algorithm overcomes this weakness of the traditional density based algorithms. The algorithm starts with partitioning the data within a cluster to units based on a user parameter and compute the density for each unit separately. Consequently, the algorithm compares the results and merges neighboring units with closer approximate density values to become a new cluster. The experimental results of the simulation show that the proposed algorithm gives good results in finding clusters for different density cluster data set.

  12. Segmentation Based on Clustering and Maximum Entropy Method%基于空间模式聚类最大熵图像分割算法研究

    Institute of Scientific and Technical Information of China (English)

    陈秋红; 沈云琴

    2012-01-01

    研究图像分割优化问题,在分割图像中,提取信息受到各种因素影响,分割效果不理想.针对图像分割计算复杂,造成图像分割分辨率低,清晰度不高.同时,当图像中的信息量非常大时,图像分割非常耗时.为了有效地分割图像,提出了一种基于空间模式聚类和最大熵算法原理相结合的图像分割方法.首先对图像采用最大熵算法进行图像分割,为每个熵区域定义特征量.根据不同的特征量计算相似区域之间的欧氏距离和空间距离,从而确定像素聚类中心的距离.然后对分割后的图像区域采用基于空间模式聚类方案进行合并,并对图像进行二值化处理.仿真表明与传统图像分割相比,提高了分割效率,分割出的图像边缘效果清晰,证明了算法的可行性和有效性.%The paper studied Image segmentation optimization problem. For the computational complexity and oth er factors, many image segmentation algorithms have low resolution of image segmentation and low clarity. When ima ges contain large amount of information, the image segmentations are very time-consuming]'. In order to effectively segment images, a space model was proposed based on clustering and principle of maximum entropy algorithm. First ly , the maximum entropy algorithm was used for image segmentation, and characteristics were defined for each entro py region. Based on different characteristics, the Euclidean distance and space distance between similar regions were calculated to determine the distance between cluster center pixel. Then, segmented image areas were clustered based on joint space mode, and binarized. Simulation results show that compared with the traditional image segmentation, this image segmentation has clear edge effects, which demonstrates the feasibility and effectiveness of the algorithm.

  13. New clustered regularly interspaced short palindromic repeat locus spacer pair typing method based on the newly incorporated spacer for Salmonella enterica.

    Science.gov (United States)

    Li, Hao; Li, Peng; Xie, Jing; Yi, Shengjie; Yang, Chaojie; Wang, Jian; Sun, Jichao; Liu, Nan; Wang, Xu; Wu, Zhihao; Wang, Ligui; Hao, Rongzhang; Wang, Yong; Jia, Leili; Li, Kaiqin; Qiu, Shaofu; Song, Hongbin

    2014-08-01

    A clustered regularly interspaced short palindromic repeat (CRISPR) typing method has recently been developed and used for typing and subtyping of Salmonella spp., but it is complicated and labor intensive because it has to analyze all spacers in two CRISPR loci. Here, we developed a more convenient and efficient method, namely, CRISPR locus spacer pair typing (CLSPT), which only needs to analyze the two newly incorporated spacers adjoining the leader array in the two CRISPR loci. We analyzed a CRISPR array of 82 strains belonging to 21 Salmonella serovars isolated from humans in different areas of China by using this new method. We also retrieved the newly incorporated spacers in each CRISPR locus of 537 Salmonella isolates which have definite serotypes in the Pasteur Institute's CRISPR Database to evaluate this method. Our findings showed that this new CLSPT method presents a high level of consistency (kappa = 0.9872, Matthew's correlation coefficient = 0.9712) with the results of traditional serotyping, and thus, it can also be used to predict serotypes of Salmonella spp. Moreover, this new method has a considerable discriminatory power (discriminatory index [DI] = 0.8145), comparable to those of multilocus sequence typing (DI = 0.8088) and conventional CRISPR typing (DI = 0.8684). Because CLSPT only costs about $5 to $10 per isolate, it is a much cheaper and more attractive method for subtyping of Salmonella isolates. In conclusion, this new method will provide considerable advantages over other molecular subtyping methods, and it may become a valuable epidemiologic tool for the surveillance of Salmonella infections.

  14. Towards semantically sensitive text clustering: a feature space modeling technology based on dimension extension.

    Science.gov (United States)

    Liu, Yuanchao; Liu, Ming; Wang, Xin

    2015-01-01

    The objective of text clustering is to divide document collections into clusters based on the similarity between documents. In this paper, an extension-based feature modeling approach towards semantically sensitive text clustering is proposed along with the corresponding feature space construction and similarity computation method. By combining the similarity in traditional feature space and that in extension space, the adverse effects of the complexity and diversity of natural language can be addressed and clustering semantic sensitivity can be improved correspondingly. The generated clusters can be organized using different granularities. The experimental evaluations on well-known clustering algorithms and datasets have verified the effectiveness of our approach.

  15. SAR image segmentation with entropy ranking based adaptive semi-supervised spectral clustering

    Science.gov (United States)

    Zhang, Xiangrong; Yang, Jie; Hou, Biao; Jiao, Licheng

    2010-10-01

    Spectral clustering has become one of the most popular modern clustering algorithms in recent years. In this paper, a new algorithm named entropy ranking based adaptive semi-supervised spectral clustering for SAR image segmentation is proposed. We focus not only on finding a suitable scaling parameter but also determining automatically the cluster number with the entropy ranking theory. Also, two kinds of constrains must-link and cannot-link based semi-supervised spectral clustering is applied to gain better segmentation results. Experimental results on SAR images show that the proposed method outperforms other spectral clustering algorithms.

  16. Prioritizing the risk of plant pests by clustering methods; self-organising maps, k-means and hierarchical clustering

    Directory of Open Access Journals (Sweden)

    Susan Worner

    2013-09-01

    Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k

  17. ONTOLOGY BASED DOCUMENT CLUSTERING USING MAPREDUCE

    Directory of Open Access Journals (Sweden)

    Abdelrahman Elsayed

    2015-05-01

    Full Text Available Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common method for representing documents is the vector space model, which represents document features as a bag of words and does not represent semantic relations between words. In this paper we introduce a distributed implementation for the bisecting k-means using MapReduce programming model. The aim behind our proposed implementation is to solve the problem of clustering intensive data documents. In addition, we propose integrating the WordNet ontology with bisecting k-means in order to utilize the semantic relations between words to enhance document clustering results. Our presented experimental results show that using lexical categories for nouns only enhances internal evaluation measures of document clustering; and decreases the documents features from thousands to tens features. Our experiments were conducted using Amazon Elastic MapReduce to deploy the Bisecting k-means algorithm.

  18. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    Science.gov (United States)

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  19. Word Clustering and Disambiguation Based on Co-occurrence Data

    CERN Document Server

    Li, H; Li, Hang; Abe, Naoki

    1998-01-01

    We address the problem of clustering words (or constructing a thesaurus) based on co-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability distribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an efficient algorithm based on the Minimum Description Length (MDL) principle for estimating such a probability distribution. Our method is a natural extension of those proposed in (Brown et al 92) and (Li & Abe 96), and overcomes their drawbacks while retaining their advantages. We then combined this clustering method with the disambiguation method of (Li & Abe 95) to derive a disambiguation method that makes use of both automatically constructed thesauruses and a hand-made thesaurus. The overall disambiguation accuracy achieved by our method is 85.2%, which compares favorably against the accuracy (82.4%) obtained by the state-of-the-art disambiguation ...

  20. The smart cluster method - Adaptive earthquake cluster identification and analysis in strong seismic regions

    Science.gov (United States)

    Schaefer, Andreas M.; Daniell, James E.; Wenzel, Friedemann

    2017-03-01

    Earthquake clustering is an essential part of almost any statistical analysis of spatial and temporal properties of seismic activity. The nature of earthquake clusters and subsequent declustering of earthquake catalogues plays a crucial role in determining the magnitude-dependent earthquake return period and its respective spatial variation for probabilistic seismic hazard assessment. This study introduces the Smart Cluster Method (SCM), a new methodology to identify earthquake clusters, which uses an adaptive point process for spatio-temporal cluster identification. It utilises the magnitude-dependent spatio-temporal earthquake density to adjust the search properties, subsequently analyses the identified clusters to determine directional variation and adjusts its search space with respect to directional properties. In the case of rapid subsequent ruptures like the 1992 Landers sequence or the 2010-2011 Darfield-Christchurch sequence, a reclassification procedure is applied to disassemble subsequent ruptures using near-field searches, nearest neighbour classification and temporal splitting. The method is capable of identifying and classifying earthquake clusters in space and time. It has been tested and validated using earthquake data from California and New Zealand. A total of more than 1500 clusters have been found in both regions since 1980 with M m i n = 2.0. Utilising the knowledge of cluster classification, the method has been adjusted to provide an earthquake declustering algorithm, which has been compared to existing methods. Its performance is comparable to established methodologies. The analysis of earthquake clustering statistics lead to various new and updated correlation functions, e.g. for ratios between mainshock and strongest aftershock and general aftershock activity metrics.

  1. An Energy Efficient Unequal Cluster Based Routing Protocol For WSN With Non-Uniform Node Distribution

    Directory of Open Access Journals (Sweden)

    Dhanoop K Dhanpal

    2015-05-01

    Full Text Available Abstract Clustering is an efficient method for increasing the lifetime of wireless sensor network systems. The current clustering algorithms generate clusters of almost equal size. This will cause hot spot problem in multi-hop sensor networks. In this paper an energy efficient varying sized clustering algorithm EEVSCA and routing protocol are introduced for non-uniform node distributed wireless sensor network system. EEVSCA helps for the construction of clusters of varying size at the same time unequal cluster based routing algorithm forces each cluster head to choose node with higher energy as their next hop. The unequal size of clusters can balance the energy consumption among clusters. Theoretical analysis and simulation results show that EECSVA balance energy consumption well among the cluster heads and increase the network lifetime effectively.

  2. Cluster Analysis as a Method of Recovering Types of Intraindividual Growth Trajectories: A Monte Carlo Study.

    Science.gov (United States)

    Dumenci, Levent; Windle, Michael

    2001-01-01

    Used Monte Carlo methods to evaluate the adequacy of cluster analysis to recover group membership based on simulated latent growth curve (LCG) models. Cluster analysis failed to recover growth subtypes adequately when the difference between growth curves was shape only. Discusses circumstances under which it was more successful. (SLD)

  3. River Health System Evaluation Based on Grey Clustering Method%基于灰色聚类法的河流健康评价

    Institute of Scientific and Technical Information of China (English)

    李自明

    2014-01-01

    This study established a set of feasible and operative river health evaluation index system to evaluate the health of the river using gray clustering method.We got the final river health level by using the analytic hierarchy process ( AHP) to calculate the weight of evaluation and the trigonometri-cally whiten weight function to evaluate the result.The method for evaluating the result of river health is consistent with the result of fuzzy math comprehensive evaluation method , it provided an effective approach to river health assessment.%通过构建一套实用可行、便于操作的河流健康评价指标体系,采用灰色聚类法来评价河流健康状况,其中结合层次分析法( AHP )计算评价指标的权重并运用灰色三角白化权函数进行评价,最终得到河流健康等级。该方法评价河流健康所得结果与模糊数学综合评价法所得结果相吻合,为河流健康评价提供了一种有效的方法。

  4. Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign

    Directory of Open Access Journals (Sweden)

    Maciej Kutera

    2010-10-01

    Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method – k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.

  5. A Method of Deep Web Clustering Based on SOM Neural Network%一种基于自组织映射神经网络的Deep Web聚类方法

    Institute of Scientific and Technical Information of China (English)

    吴凌云

    2012-01-01

    为提高Deepwleb数据源聚类的效率,降低人工参与度,提出了一种基于自组织映射网络SOM的DeepWeb接口聚类方法。该方法采用PRE.QUERY方式,使用接口表单的结构特征统计量作为输入。在UIUC数据集上测试后取得了预期的效果。%In order to improve the efficiency of Deep Web data sources clustering and reduce the manual work, this paper addressed a method of Deep Web interface clustering based on self-orgaalizing map neural network, which utilizes PREQUERY and takes the struetual statistic as inputs. After testing on UIUC datasets, this method gets an expected effect.

  6. An Effective Method of Producing Small Neutral Carbon Clusters

    Institute of Scientific and Technical Information of China (English)

    XIA Zhu-Hong; CHEN Cheng-Chu; HSU Yen-Chu

    2007-01-01

    An effective method of producing small neutral carbon clusters Cn (n = 1-6) is described. The small carbon clusters (positive or negative charge or neutral) are formed by plasma which are produced by a high power 532nm pulse laser ablating the surface of the metal Mn rod to react with small hydrocarbons supplied by a pulse valve, then the neutral carbon clusters are extracted and photo-ionized by another laser (266nm or 355nm) in the ionization region of a linear time-of-flight mass spectrometer. The distributions of the initial neutral carbon clusters are analysed with the ionic species appeared in mass spectra. It is observed that the yield of small carbon clusters with the present method is about 10 times than that of the traditional widely used technology of laser vaporization of graphite.

  7. Efficient nonparametric and asymptotic Bayesian model selection methods for attributed graph clustering

    KAUST Repository

    Xu, Zhiqiang

    2017-02-16

    Attributed graph clustering, also known as community detection on attributed graphs, attracts much interests recently due to the ubiquity of attributed graphs in real life. Many existing algorithms have been proposed for this problem, which are either distance based or model based. However, model selection in attributed graph clustering has not been well addressed, that is, most existing algorithms assume the cluster number to be known a priori. In this paper, we propose two efficient approaches for attributed graph clustering with automatic model selection. The first approach is a popular Bayesian nonparametric method, while the second approach is an asymptotic method based on a recently proposed model selection criterion, factorized information criterion. Experimental results on both synthetic and real datasets demonstrate that our approaches for attributed graph clustering with automatic model selection significantly outperform the state-of-the-art algorithm.

  8. Computing gene expression data with a knowledge-based gene clustering approach.

    Science.gov (United States)

    Rosa, Bruce A; Oh, Sookyung; Montgomery, Beronda L; Chen, Jin; Qin, Wensheng

    2010-01-01

    Computational analysis methods for gene expression data gathered in microarray experiments can be used to identify the functions of previously unstudied genes. While obtaining the expression data is not a difficult task, interpreting and extracting the information from the datasets is challenging. In this study, a knowledge-based approach which identifies and saves important functional genes before filtering based on variability and fold change differences was utilized to study light regulation. Two clustering methods were used to cluster the filtered datasets, and clusters containing a key light regulatory gene were located. The common genes to both of these clusters were identified, and the genes in the common cluster were ranked based on their coexpression to the key gene. This process was repeated for 11 key genes in 3 treatment combinations. The initial filtering method reduced the dataset size from 22,814 probes to an average of 1134 genes, and the resulting common cluster lists contained an average of only 14 genes. These common cluster lists scored higher gene enrichment scores than two individual clustering methods. In addition, the filtering method increased the proportion of light responsive genes in the dataset from 1.8% to 15.2%, and the cluster lists increased this proportion to 18.4%. The relatively short length of these common cluster lists compared to gene groups generated through typical clustering methods or coexpression networks narrows the search for novel functional genes while increasing the likelihood that they are biologically relevant.

  9. Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena

    Science.gov (United States)

    Pankratius, V.; Gowanlock, M.; Blair, D. M.

    2015-12-01

    Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).

  10. A Low Energy-Consuming Private Data Aggregation Method Based on Unequal Clustering%一种基于非均匀分簇的低能耗安全数据融合方法

    Institute of Scientific and Technical Information of China (English)

    万润泽; 王海军; 张兴艳

    2013-01-01

    In order to ensure the data privacy of the sensor nodes and improve the network life cycle,this paper proposes a low energy-consuming private data aggregation method based on unequal clustering(UCDA).Firstly,the tentative cluster heads construct clusters of unequal sizes by using uneven competition ranges according to the distance to Base station.Then,the fragment recombination technology is adopted.To reduce the communication overhead and obtain the data privacy.the node in the overlap area of different clusters slice its private data into pieces,and then it send to the cluster heads for mixing in the next process.Simulation results show that UCDA can preserve data privacy,get accurate data aggregation results and cost less communication and computation overhead when compared with the SMART algorithm.%为了在保证网络节点数据隐私性的同时,提高网络的生命周期,提出了一种轻量级的基于非均匀分簇的安全数据融合方案UCDA.首先根据距离BS节点远近构造不同大小的簇,然后再采用分片重组技术,位于不同簇重叠区域的节点将分片数据发送至不同簇的簇头节点进行数据的混杂,在降低通信开销的同时提高了数据安全性.仿真实验和理论分析证明,UCDA方案在满足一般隐私保护要求的前提下,比SMART算法花费更少的通信开销和计算开销,并保证了数据融合结果的精确性.

  11. Relativistic extended coupled cluster method for magnetic hyperfine structure constant

    CERN Document Server

    Sasmal, Sudip; Nayak, Malaya K; Vaval, Nayana; Pal, Sourav

    2015-01-01

    This article deals with the general implementation of 4-component spinor relativistic extended coupled cluster (ECC) method to calculate first order property of atoms and molecules in their open-shell ground state configuration. The implemented relativistic ECC is employed to calculate hyperfine structure (HFS) constant of alkali metals (Li, Na, K, Rb and Cs), singly charged alkaline earth metal atoms (Be+, Mg+, Ca+ and Sr+) and molecules (BeH, MgF and CaH). We have compared our ECC results with the calculations based on restricted active space configuration interaction (RAS-CI) method. Our results are in better agreement with the available experimental values than those of the RAS-CI values.

  12. PARTIAL TRAINING METHOD FOR HEURISTIC ALGORITHM OF POSSIBLE CLUSTERIZATION UNDER UNKNOWN NUMBER OF CLASSES

    Directory of Open Access Journals (Sweden)

    D. A. Viattchenin

    2009-01-01

    Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible  clusterization with partial  training is proposed in the  paper.  The  method  is  based  on  data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.

  13. Region Based Energy Balanced Inter-cluster communication Protocol for Sensor networks

    OpenAIRE

    Sharma, Rohini; Loboyal, D. K.

    2015-01-01

    Wireless sensor networks faces unbalanced energy consumption problem over time. Clustering provides an energy efficient method to improve lifespan of the sensor network. Cluster head collects data from other nodes and transmits it towards the sink node. Cluster heads which are far-off from the sink, consumes more power in transmission of information towards the sink. We propose Region Based Energy Balanced Inter-cluster communication protocol (RBEBP) to improve lifespan of the sensor network....

  14. Formation and evolution of MnNi clusters in neutron irradiated dilute Fe alloys modelled by a first principle-based AKMC method

    Energy Technology Data Exchange (ETDEWEB)

    Ngayam-Happy, R. [EDF-R and D, Departement Materiaux et Mecanique des Composants (MMC), Les Renardieres, F-77818 Moret sur Loing Cedex (France); Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France); Becquart, C.S., E-mail: charlotte.becquart@univ-lille1.fr [Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France); Domain, C. [EDF-R and D, Departement Materiaux et Mecanique des Composants (MMC), Les Renardieres, F-77818 Moret sur Loing Cedex (France); Unite Materiaux et Transformations (UMET), UMR CNRS 8207, Universite de Lille 1, ENSCL, F-59655 Villeneuve d' Ascq Cedex (France); Laboratoire commun EDF-CNRS Etude et Modelisation des Microstructures pour le Vieillissement des Materiaux (EM2VM) (France)

    2012-07-15

    An atomistic Monte Carlo model parameterised on electronic structure calculations data has been used to study the formation and evolution under irradiation of solute clusters in Fe-MnNi ternary and Fe-CuMnNi quaternary alloys. Two populations of solute rich clusters have been observed, which can be discriminated by whether or not the solute atoms are associated with self-interstitial clusters. Mn-Ni-rich clusters are observed at a very early stage of the irradiation in both modelled alloys, whereas the quaternary alloys contain also Cu-containing clusters. Mn-Ni-rich clusters nucleate very early via a self-interstitial-driven mechanism, earlier than Cu-rich clusters; the latter, however, which are likely to form via a vacancy-driven mechanism, grow in number much faster than the former, helped by the thermodynamic driving force to Cu precipitation in Fe, thereby becoming dominant in the low dose regime. The kinetics of the number density increase of the two populations is thus significantly different. Finally the main conclusion suggested by this work is that the so-called late blooming phases might as well be neither late, nor phases.

  15. Constructing storyboards based on hierarchical clustering analysis

    Science.gov (United States)

    Hasebe, Satoshi; Sami, Mustafa M.; Muramatsu, Shogo; Kikuchi, Hisakazu

    2005-07-01

    There are growing needs for quick preview of video contents for the purpose of improving accessibility of video archives as well as reducing network traffics. In this paper, a storyboard that contains a user-specified number of keyframes is produced from a given video sequence. It is based on hierarchical cluster analysis of feature vectors that are derived from wavelet coefficients of video frames. Consistent use of extracted feature vectors is the key to avoid a repetition of computationally-intensive parsing of the same video sequence. Experimental results suggest that a significant reduction in computational time is gained by this strategy.

  16. Genetic algorithm based two-mode clustering of metabolomics data

    NARCIS (Netherlands)

    Hageman, J.A.; Berg, R.A. van den; Westerhuis, J.A.; Werf, M.J. van der; Smilde, A.K.

    2008-01-01

    Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering metho

  17. Structure and spectroscopic aspects of water-halide ion clusters: a study based on a conjunction of stochastic and quantum chemical methods.

    Science.gov (United States)

    Neogi, Soumya Ganguly; Chaudhury, Pinaki

    2013-03-05

    In this article, we propose a stochastic search based method, namely genetic algorithm in conjunction with density functional theory to evaluate structures of water-halide microclusters, with the halide ion being Cl(-), Br(-), and I(-). Once the structures are established, we evaluate the infrared spectroscopic modes, vertical detachment energies and natural population analysis based charges. We compare our results with available experimental and theoretical results.

  18. Knowledge based cluster ensemble for cancer discovery from biomolecular data.

    Science.gov (United States)

    Yu, Zhiwen; Wongb, Hau-San; You, Jane; Yang, Qinmin; Liao, Hongying

    2011-06-01

    The adoption of microarray techniques in biological and medical research provides a new way for cancer diagnosis and treatment. In order to perform successful diagnosis and treatment of cancer, discovering and classifying cancer types correctly is essential. Class discovery is one of the most important tasks in cancer classification using biomolecular data. Most of the existing works adopt single clustering algorithms to perform class discovery from biomolecular data. However, single clustering algorithms have limitations, which include a lack of robustness, stability, and accuracy. In this paper, we propose a new cluster ensemble approach called knowledge based cluster ensemble (KCE) which incorporates the prior knowledge of the data sets into the cluster ensemble framework. Specifically, KCE represents the prior knowledge of a data set in the form of pairwise constraints. Then, the spectral clustering algorithm (SC) is adopted to generate a set of clustering solutions. Next, KCE transforms pairwise constraints into confidence factors for these clustering solutions. After that, a consensus matrix is constructed by considering all the clustering solutions and their corresponding confidence factors. The final clustering result is obtained by partitioning the consensus matrix. Comparison with single clustering algorithms and conventional cluster ensemble approaches, knowledge based cluster ensemble approaches are more robust, stable and accurate. The experiments on cancer data sets show that: 1) KCE works well on these data sets; 2) KCE not only outperforms most of the state-of-the-art single clustering algorithms, but also outperforms most of the state-of-the-art cluster ensemble approaches.

  19. Calculation of Dipole Transition Matrix Elements and Expectation Values by Vibrational Coupled Cluster Method.

    Science.gov (United States)

    Banik, Subrata; Pal, Sourav; Prasad, M Durga

    2010-10-12

    An effective operator approach based on the coupled cluster method is described and applied to calculate vibrational expectation values and absolute transition matrix elements. Coupled cluster linear response theory (CCLRT) is used to calculate excited states. The convergence pattern of these properties with the rank of the excitation operator is studied. The method is applied to a water molecule. Arponen-type double similarity transformation in extended coupled cluster (ECCM) framework is also used to generate an effective operator, and the convergence pattern of these properties is compared to the normal coupled cluster (NCCM) approach. It is found that the coupled cluster method provides an accurate description of these quantities for low lying vibrational excited states. The ECCM provides a significant improvement for the calculation of the transition matrix elements.

  20. A New Method for Medical Image Clustering Using Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    Akbar Shahrzad Khashandarag

    2013-01-01

    Full Text Available Segmentation is applied in medical images when the brightness of the images becomes weaker so that making different in recognizing the tissues borders. Thus, the exact segmentation of medical images is an essential process in recognizing and curing an illness. Thus, it is obvious that the purpose of clustering in medical images is the recognition of damaged areas in tissues. Different techniques have been introduced for clustering in different fields such as engineering, medicine, data mining and so on. However, there is no standard technique of clustering to present ideal results for all of the imaging applications. In this paper, a new method combining genetic algorithm and k-means algorithm is presented for clustering medical images. In this combined technique, variable string length genetic algorithm (VGA is used for the determination of the optimal cluster centers. The proposed algorithm has been compared with the k-means clustering algorithm. The advantage of the proposed method is the accuracy in selecting the optimal cluster centers compared with the above mentioned technique.

  1. A liquid drop model for embedded atom method cluster energies

    Science.gov (United States)

    Finley, C. W.; Abel, P. B.; Ferrante, J.

    1996-01-01

    Minimum energy configurations for homonuclear clusters containing from two to twenty-two atoms of six metals, Ag, Au, Cu, Ni, Pd, and Pt have been calculated using the Embedded Atom Method (EAM). The average energy per atom as a function of cluster size has been fit to a liquid drop model, giving estimates of the surface and curvature energies. The liquid drop model gives a good representation of the relationship between average energy and cluster size. As a test the resulting surface energies are compared to EAM surface energy calculations for various low-index crystal faces with reasonable agreement.

  2. A New Combinational Electrical Load Analysis Method Based on Bilayer Clustering Analysis%一种基于双层聚类分析的负荷形态组合识别方法

    Institute of Scientific and Technical Information of China (English)

    王星华; 陈卓优; 彭显刚

    2016-01-01

    区别于传统用户用电行为分析方法,提出一种以聚类算法为基础的双层聚类分析方法。该方法结合给出的内、外层变随机设置为有目的选取初始聚类中心的选取规则,解决了聚类算法受初始聚类中心随机选取的影响,其收敛容易陷入局部最小化的问题。利用余弦相似度形态相似作为外层聚类的判据、欧式距离相近作为内层聚类的判据,对不需要经过归一化处理的用户用电轨迹向量进行分类。最后对某地区电力用户日负荷曲线进行算例分析,结果表明:双层聚类组合方法能把不同负荷形态及其大、小用户准确识别出来,实现了地区负荷形态的自动分类识别功能,证明了上述方法的有效性和优越性。%1Different from traditional electricity customer behavior analysis method, this paper presents a combinational load profile recognition method with two layers based on improved clustering algorithm. In order to overcome defect of traditional clustering algorithm, sensitive to initial conditions and usually leading to local minimum results, selection rules for two layers of initial cluster centers are proposed. It changes cluster center selection from random set to purposeful selection. On outer layer, it uses cosine similarity function as its performance evaluation index; on inner layer, Euclidean distance function is used as evaluation index. For the proposed method, it is not necessary to normalize load curve vector in analysis. This method is utilized to analyze calculation examples and daily electricity consumption curves of whole customers in a certain area. The method can automatically recognize cluster number, load categories and load distributions. The result verifies effectiveness and superiority of the proposed method.

  3. 基于多空间多层次谱聚类的非监督SAR图像分割算法%Segmentation method for SAR images based on unsupervised spectral clustering of multi-hierarchical region

    Institute of Scientific and Technical Information of China (English)

    田玲; 邓旌波; 廖紫纤; 石博; 何楚

    2013-01-01

    提出了一种基于多层区域谱聚类的非监督SAR图像分割算法(multi-space and multi-hierarchical region based spectral clustering,MSMHSC).该算法首先在特征与几何空间求距离,快速获得初始过分割区域,然后在过分割区域的谱空间上进行聚类,最终实现非监督的SAR图像分割.该方法计算复杂度小,无须训练样本,使用层次化思想使其能更充分地利用SAR图像各类先验与似然信息.在MSTAR真实SAR数据集上的实验验证了该算法的快速性和有效性.%This paper proposed a method based on the hierarchical clustering concept.First,it over-segmented the source image into many small regions.And then,it conducted a spectral clustering algorithm on those regions.The algorithm was tested on the MSTAR SAR data set,and was proved to be fast and efficient.

  4. Clustering-based redshift estimation: application to VIPERS/CFHTLS

    CERN Document Server

    Scottez, V; Granett, B R; Moutard, T; Kilbinger, M; Scodeggio, M; Garilli, B; Bolzonella, M; de la Torre, S; Guzzo, L; Abbas, U; Adami, C; Arnouts, S; Bottini, D; Branchini, E; Cappi, A; Cucciati, O; Davidzon, I; Fritz, A; Franzetti, P; Iovino, A; Krywult, J; Brun, V Le; Fèvre, O Le; Maccagni, D; Małek, K; Marulli, F; Polletta, M; Pollo, A; Tasca, L A M; Tojeiro, R; Vergani, D; Zanichelli, A; Bel, J; Coupon, J; De Lucia, G; Ilbert, O; McCracken, H J; Moscardini, L

    2016-01-01

    We explore the accuracy of the clustering-based redshift estimation proposed by M\\'enard et al. (2013) when applied to VIPERS and CFHTLS real data. This method enables us to reconstruct redshift distributions from measurement of the angular clus- tering of objects using a set of secure spectroscopic redshifts. We use state of the art spectroscopic measurements with iAB 0.5 which allows us to test the accuracy of the clustering-based red- shift distributions. We show that this method enables us to reproduce the true mean color-redshift relation when both populations have the same magnitude limit. We also show that this technique allows the inference of redshift distributions for a population fainter than the one of reference and we give an estimate of the color-redshift mapping in this case. This last point is of great interest for future large redshift surveys which suffer from the need of a complete faint spectroscopic sample.

  5. Mapping Soil Texture of a Plain Area Using Fuzzy-c-Means Clustering Method Based on Land Surface Diurnal Temperature Difference

    Institute of Scientific and Technical Information of China (English)

    WANG De-Cai; ZHANG Gan-Lin; PAN Xian-Zhang; ZHAO Yu-Guo; ZHAO Ming-Song; WANG Gai-Fen

    2012-01-01

    The use of landscape covariates to estimate soil properties is not suitable for the areas of low relief due to the high variability of soil properties in similar topographic and vegetation conditions.A new method was implemented to map regional soil texture (in terms of sand,silt and clay contents) by hypothesizing that the change in the land surface diurnal temperature difference (DTD) is related to soil texture in case of a relatively homogeneous rainfall input.To examine this hypothesis,the DTDs from moderate resolution imagine spectroradiometer (MODIS) during a selected time period,i.e.,after a heavy rainfall between autumn harvest and autumn sowing,were classified using fuzzy-c-means (FCM) clustering.Six classes were generated,and for each class,the sand (> 0.05 mm),silt (0.002-0.05 mm) and clay (< 0.002 mm) contents at the location of maximum membership value were considered as the typical values of that class.A weighted average model was then used to digitally map soil texture.The results showed that the predicted map quite accurately reflected the regional soil variation.A validation dataset produced estimates of error for the predicted maps of sand,silt and clay contents at root mean of squared error values of 8.4%,7.8% and 2.3%,respectively,which is satisfactory in a practical context.This study thus provided a methodology that can help improve the accuracy and efficiency of soil texture mapping in plain areas using easily available data sources.

  6. Displacement of Building Cluster Using Field Analysis Method

    Institute of Scientific and Technical Information of China (English)

    Al Tinghua

    2003-01-01

    This paper presents a field based method to deal with the displacement of building cluster,which is driven by the street widening. The compress of street boundary results in the force to push the building moving inside and the force propagation is a decay process. To describe the phenomenon above, the field theory is introduced with the representation model of isoline. On the basis of the skeleton of Delaunay triangulation,the displacement field is built in which the propagation force is related to the adjacency degree with respect to the street boundary. The study offers the computation of displacement direction and offset distance for the building displacement. The vector operation is performed on the basis of grade and other field concepts.

  7. A NEW METHOD TO QUANTIFY X-RAY SUBSTRUCTURES IN CLUSTERS OF GALAXIES

    Energy Technology Data Exchange (ETDEWEB)

    Andrade-Santos, Felipe; Lima Neto, Gastao B.; Lagana, Tatiana F. [Departamento de Astronomia, Instituto de Astronomia, Geofisica e Ciencias Atmosfericas, Universidade de Sao Paulo, Geofisica e Ciencias Atmosfericas, Rua do Matao 1226, Cidade Universitaria, 05508-090 Sao Paulo, SP (Brazil)

    2012-02-20

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model ({beta}-model or Sersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range z in [0.02, 0.2] and have a signal-to-noise ratio (S/N) greater than 100. We present the calibration of the method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high- and low-substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image size) and good S/N observations. We find no evidence of correlation between the substructure level and physical properties of the clusters such as gas temperature, X-ray luminosity, and redshift; however, analysis suggest a trend between the substructure level and cluster mass. The scaling relations for the two sub-samples (high- and low-substructure level clusters) are different (they present an offset, i.e., given a fixed mass or temperature, low-substructure clusters tend to be more X-ray luminous), which is an important result for cosmological tests using the mass-luminosity relation to obtain the cluster mass function, since they rely on the assumption that clusters do not present different scaling relations according to their dynamical state.

  8. Motion estimation using point cluster method and Kalman filter.

    Science.gov (United States)

    Senesh, M; Wolf, A

    2009-05-01

    The most frequently used method in a three dimensional human gait analysis involves placing markers on the skin of the analyzed segment. This introduces a significant artifact, which strongly influences the bone position and orientation and joint kinematic estimates. In this study, we tested and evaluated the effect of adding a Kalman filter procedure to the previously reported point cluster technique (PCT) in the estimation of a rigid body motion. We demonstrated the procedures by motion analysis of a compound planar pendulum from indirect opto-electronic measurements of markers attached to an elastic appendage that is restrained to slide along the rigid body long axis. The elastic frequency is close to the pendulum frequency, as in the biomechanical problem, where the soft tissue frequency content is similar to the actual movement of the bones. Comparison of the real pendulum angle to that obtained by several estimation procedures--PCT, Kalman filter followed by PCT, and low pass filter followed by PCT--enables evaluation of the accuracy of the procedures. When comparing the maximal amplitude, no effect was noted by adding the Kalman filter; however, a closer look at the signal revealed that the estimated angle based only on the PCT method was very noisy with fluctuation, while the estimated angle based on the Kalman filter followed by the PCT was a smooth signal. It was also noted that the instantaneous frequencies obtained from the estimated angle based on the PCT method is more dispersed than those obtained from the estimated angle based on Kalman filter followed by the PCT method. Addition of a Kalman filter to the PCT method in the estimation procedure of rigid body motion results in a smoother signal that better represents the real motion, with less signal distortion than when using a digital low pass filter. Furthermore, it can be concluded that adding a Kalman filter to the PCT procedure substantially reduces the dispersion of the maximal and minimal

  9. Cluster parallel rendering based on encoded mesh

    Institute of Scientific and Technical Information of China (English)

    QIN Ai-hong; XIONG Hua; PENG Hao-yu; LIU Zhen; SHI Jiao-ying

    2006-01-01

    Use of compressed mesh in parallel rendering architecture is still an unexplored area, the main challenge of which is to partition and sort the encoded mesh in compression-domain. This paper presents a mesh compression scheme PRMC (Parallel Rendering based Mesh Compression) supplying encoded meshes that can be partitioned and sorted in parallel rendering system even in encoded-domain. First, we segment the mesh into submeshes and clip the submeshes' boundary into Runs, and then piecewise compress the submeshes and Runs respectively. With the help of several auxiliary index tables, compressed submeshes and Runs can serve as rendering primitives in parallel rendering system. Based on PRMC, we design and implement a parallel rendering architecture. Compared with uncompressed representation, experimental results showed that PRMC meshes applied in cluster parallel rendering system can dramatically reduce the communication requirement.

  10. A multi-sequential number-theoretic optimization algorithm using clustering methods

    Institute of Scientific and Technical Information of China (English)

    XU Qing-song; LIANG Yi-zeng; HOU Zhen-ting

    2005-01-01

    A multi-sequential number-theoretic optimization method based on clustering was developed and applied to the optimization of functions with many local extrema. Details of the procedure to generate the clusters and the sequential schedules were given. The algorithm was assessed by comparing its performance with generalized simulated annealing algorithm in a difficult instructive example and a D-optimum experimental design problem. It is shown the presented algorithm to be more effective and reliable based on the two examples.

  11. A Color Texture Image Segmentation Method Based on Fuzzy c-Means Clustering and Region-Level Markov Random Field Model

    Directory of Open Access Journals (Sweden)

    Guoying Liu

    2015-01-01

    Full Text Available This paper presents a variation of the fuzzy local information c-means clustering (FLICM algorithm that provides color texture image clustering. The proposed algorithm incorporates region-level spatial, spectral, and structural information in a novel fuzzy way. The new algorithm, called RFLICM, combines FLICM and region-level Markov random field model (RMRF together to make use of large scale interactions between image patches instead of pixels. RFLICM can overcome the weakness of FLICM when dealing with textured images and at the same time enhances the clustering performance. The major characteristic of RFLICM is the use of a region-level fuzzy factor, aiming to guarantee texture homogeneity and preserve region boundaries. Experiments performed on synthetic and remote sensing images show that RFLICM is effective in providing accuracy to color texture images.

  12. A statistical information-based clustering approach in distance space

    Institute of Scientific and Technical Information of China (English)

    YUE Shi-hong; LI Ping; GUO Ji-dong; ZHOU Shui-geng

    2005-01-01

    Clustering, as a powerful data mining technique for discovering interesting data distributions and patterns in the underlying database, is used in many fields, such as statistical data analysis, pattern recognition, image processing, and other business applications. Density-based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996) is a good performance clustering method for dealing with spatial data although it leaves many problems to be solved. For example,DBSCAN requires a necessary user-specified threshold while its computation is extremely time-consuming by current method such as OPTICS, etc. (Ankerst et al., 1999), and the performance of DBSCAN under different norms has yet to be examined. In this paper, we first developed a method based on statistical information of distance space in database to determine the necessary threshold. Then our examination of the DBSCAN performance under different norms showed that there was determinable relation between them. Finally, we used two artificial databases to verify the effectiveness and efficiency of the proposed methods.

  13. Lightweight and Distributed Connectivity-Based Clustering Derived from Schelling's Model

    Science.gov (United States)

    Tsugawa, Sho; Ohsaki, Hiroyuki; Imase, Makoto

    In the literature, two connectivity-based distributed clustering schemes exist: CDC (Connectivity-based Distributed node Clustering scheme) and SDC (SCM-based Distributed Clustering). While CDC and SDC have mechanisms for maintaining clusters against nodes joining and leaving, neither method assumes that frequent changes occur in the network topology. In this paper, we propose a lightweight distributed clustering method that we term SBDC (Schelling-Based Distributed Clustering) since this scheme is derived from Schelling's model — a popular segregation model in sociology. We evaluate the effectiveness of the proposed SBDC in an environment where frequent changes arise in the network topology. Our simulation results show that SBDC outperforms CDC and SDC under frequent changes in network topology caused by high node mobility.

  14. Analysis of historical and modern hard red spring wheat cultivars based on parentage and HPLC of gluten proteins using Ward's clustering method

    Science.gov (United States)

    There have been substantial breeding efforts in North Dakota to produce wheat cultivars that are well adapted to weather conditions and are disease resistant. In this study, 30 hard red spring (HRS) wheat cultivars released between 1910 and 2013 were analyzed with regard to how they cluster in terms...

  15. CRPCG—Clustering Routing Protocol based on Connected Graph

    Directory of Open Access Journals (Sweden)

    Feng Li

    2011-05-01

    Full Text Available In order to balance the load between cluster head, save the energy consumption of the inter-cluster routing, enhance reliability and flexibility of data transmission, the paper proposes a new clustering routing protocol based on connected graph (CRPCG. The protocol optimizes and innovates in three aspects: cluster head election, clusters formation and clusters routing. Eventually, a connected graph is constituted by the based station and all cluster heads, using the excellent algorithm of the graph theory, to guarantee the network connectivity and reliability, improve the link quality, balance node energy and prolong the network life cycle. The results of simulation show that, the protocol significantly prolong the network life cycle, balance the energy of network nodes, especially in the phase of inter-cluster data transmission, improving the reliability and efficiency of data transmission.

  16. Methods for analyzing cost effectiveness data from cluster randomized trials

    Directory of Open Access Journals (Sweden)

    Clark Allan

    2007-09-01

    Full Text Available Abstract Background Measurement of individuals' costs and outcomes in randomized trials allows uncertainty about cost effectiveness to be quantified. Uncertainty is expressed as probabilities that an intervention is cost effective, and confidence intervals of incremental cost effectiveness ratios. Randomizing clusters instead of individuals tends to increase uncertainty but such data are often analysed incorrectly in published studies. Methods We used data from a cluster randomized trial to demonstrate five appropriate analytic methods: 1 joint modeling of costs and effects with two-stage non-parametric bootstrap sampling of clusters then individuals, 2 joint modeling of costs and effects with Bayesian hierarchical models and 3 linear regression of net benefits at different willingness to pay levels using a least squares regression with Huber-White robust adjustment of errors, b a least squares hierarchical model and c a Bayesian hierarchical model. Results All five methods produced similar results, with greater uncertainty than if cluster randomization was not accounted for. Conclusion Cost effectiveness analyses alongside cluster randomized trials need to account for study design. Several theoretically coherent methods can be implemented with common statistical software.

  17. Fast optimization of binary clusters using a novel dynamic lattice searching method.

    Science.gov (United States)

    Wu, Xia; Cheng, Wen

    2014-09-28

    Global optimization of binary clusters has been a difficult task despite of much effort and many efficient methods. Directing toward two types of elements (i.e., homotop problem) in binary clusters, two classes of virtual dynamic lattices are constructed and a modified dynamic lattice searching (DLS) method, i.e., binary DLS (BDLS) method, is developed. However, it was found that the BDLS can only be utilized for the optimization of binary clusters with small sizes because homotop problem is hard to be solved without atomic exchange operation. Therefore, the iterated local search (ILS) method is adopted to solve homotop problem and an efficient method based on the BDLS method and ILS, named as BDLS-ILS, is presented for global optimization of binary clusters. In order to assess the efficiency of the proposed method, binary Lennard-Jones clusters with up to 100 atoms are investigated. Results show that the method is proved to be efficient. Furthermore, the BDLS-ILS method is also adopted to study the geometrical structures of (AuPd)79 clusters with DFT-fit parameters of Gupta potential.

  18. An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO

    Directory of Open Access Journals (Sweden)

    Jian Zhang

    2014-01-01

    Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.

  19. A Method of Network Traffic Identification Based on Improved Clustering Algorithms%基于改进分簇算法的网络流量识别方法

    Institute of Scientific and Technical Information of China (English)

    王宇科; 黎文伟; 苏欣

    2011-01-01

    The automatic detection of applications associated with network traffic is very important for network security and traffic management. Unfortunately, because of some of the applications like P2P, VOIP applications using dynamic port numbers, masquerading techniques, and encryption, it is difficult using simple port-based analysis to classify packet payloads in order to identify these applica tions. And many research works have proposed using the clustering algorithms to identify network traf fic, but these algorithms have some defects in how to choose the cluster center and the number of clus ters. In this paper, we first use the Weighting D2 algorithm to improve the selection of the initialized cluster centers, and use the value of NMK Normalize Mutual Information) to ascertain the number of clusters, and then get an improved clustering algorithm, and finally propose a application level identifi cation method based on this algorithm. The experimental results show that this method reaches 90% ac curacy or more, and gets lower False Positive Rate and False Rejection Rate.%网络流量相关应用的自动检测对于网络安全和流量管理来说非常重要.但是,由于Peer-to-Peer(P2P)、VOIP等网络新应用使用动态端口、伪装和加密流等技术,使得基于端口匹配和数据包特征字段分析等识别方法在识别这些应用时存在一定的难度.不少研究工作提出了分簇算法进行流量识别,但现有的分簇算法在簇中心和簇数目的选择上存在一定缺陷.本文首先使用基于Weighting D2算法对初始化簇中心选择进行改进,通过NMI值来确定簇的数目,得到改进的分簇算法,并提出一种基于该算法的应用层流量识别方法.对于应用层流量,尤其是P2P应用识别实验结果表明,该方法能达到90%以上的识别率以及较低的误识别率和漏识别率.

  20. An incremental clustering algorithm based on Mahalanobis distance

    Science.gov (United States)

    Aik, Lim Eng; Choon, Tan Wee

    2014-12-01

    Classical fuzzy c-means clustering algorithm is insufficient to cluster non-spherical or elliptical distributed datasets. The paper replaces classical fuzzy c-means clustering euclidean distance with Mahalanobis distance. It applies Mahalanobis distance to incremental learning for its merits. A Mahalanobis distance based fuzzy incremental clustering learning algorithm is proposed. Experimental results show the algorithm is an effective remedy for the defect in fuzzy c-means algorithm but also increase training accuracy.

  1. Performance Improvement of Cache Management In Cluster Based MANET

    Directory of Open Access Journals (Sweden)

    Abdulaziz Zam

    2013-08-01

    Full Text Available Caching is one of the most effective techniques used to improve the data access performance in wireless networks. Accessing data from a remote server imposes high latency and power consumption through forwarding nodes that guide the requests to the server and send data back to the clients. In addition, accessing data may be unreliable or even impossible due to erroneous wireless links and frequently disconnections. Due to the nature of MANET and its high frequent topology changes, and also small cache size and constrained power supply in mobile nodes, the management of the cache would be a challenge. To maintain the MANET’s stability and scalability, clustering is considered as an effective approach. In this paper an efficient cache management method is proposed for the Cluster Based Mobile Ad-hoc NETwork (C-B-MANET. The performance of the method is evaluated in terms of packet delivery ratio, latency and overhead metrics.

  2. Internet Forensics Framework Based-on Clustering

    Directory of Open Access Journals (Sweden)

    Imam Riadi

    2013-01-01

    Full Text Available Internet network attacks are complicated and worth studying. The attacks include Denial of Service (DoS. DoS attacks that exploit vulnerabilities found in operating systems, network services and applications. Indicators of DoS attacks, is when legitimate users cannot access the system. This paper proposes a framework for Internet based forensic logs that aims to assist in the investigation process to reveal DoS attacks. The framework in this study consists of several steps, among others : logging into the text file and database as well as identifying an attack based on the packet header length. After the identification process, logs are grouped using k-means clustering algorithm into three levels of attack (dangerous, rather dangerous and not dangerous based on port numbers and tcpflags of the package. Based on the test results the proposed framework can be grouped into three level attacks and found the attacker with a success rate of 89,02%, so, it can be concluded that the proposed framework can meet the goals set in this research.

  3. Report of a Workshop on Parallelization of Coupled Cluster Methods

    Energy Technology Data Exchange (ETDEWEB)

    Rodney J. Bartlett Erik Deumens

    2008-05-08

    The benchmark, ab initio quantum mechanical methods for molecular structure and spectra are now recognized to be coupled-cluster theory. To benefit from the transiiton to tera- and petascale computers, such coupled-cluster methods must be created to run in a scalable fashion. This Workshop, held as a aprt of the 48th annual Sanibel meeting, at St. Simns, Island, GA, addressed that issue. Representatives of all the principal scientific groups who are addressing this topic were in attendance, to exchange information about the problem and to identify what needs to be done in the future. This report summarized the conclusions of the workshop.

  4. Improved method for the feature extraction of laser scanner using genetic clustering

    Institute of Scientific and Technical Information of China (English)

    Yu Jinxia; Cai Zixing; Duan Zhuohua

    2008-01-01

    Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method based on genetic clustering VGA-clustering is presented. By integrating the spatial neighbouring information of range data into fuzzy clustering algorithm, a weighted fuzzy clustering algorithm (WFCA) instead of standard clustering algorithm is introduced to realize feature extraction of laser scanner. Aimed at the unknown clustering number in advance, several validation index functions are used to estimate the validity of different clustering al-gorithms and one validation index is selected as the fitness function of genetic algorithm so as to determine the accurate clustering number automatically. At the same time, an improved genetic algorithm IVGA on the basis of VGA is proposed to solve the local optimum of clustering algorithm, which is implemented by increasing the population diversity and improving the genetic operators of elitist rule to enhance the local search capacity and to quicken the convergence speed. By the comparison with other algorithms, the effectiveness of the algorithm introduced is demonstrated.

  5. Cluster-in-molecule local correlation method for large systems

    Institute of Scientific and Technical Information of China (English)

    LI Wei; LI ShuHua

    2014-01-01

    A linear scaling local correlation method,cluster-in-molecule(CIM)method,was developed in the last decade for large systems.The basic idea of the CIM method is that the electron correlation energy of a large system,within the M ller-Plesset perturbation theory(MP)or coupled cluster(CC)theory,can be approximately obtained from solving the corresponding MP or CC equations of various clusters.Each of such clusters consists of a subset of localized molecular orbitals(LMOs)of the target system,and can be treated independently at various theory levels.In the present article,the main idea of the CIM method is reviewed,followed by brief descriptions of some recent developments,including its multilevel extension and different ways of constructing clusters.Then,some applications for large systems are illustrated.The CIM method is shown to be an efficient and reliable method for electron correlation calculations of large systems,including biomolecules and supramolecular complexes.

  6. Bayesian Analysis of Two Stellar Populations in Galactic Globular Clusters. I. Statistical and Computational Methods

    Science.gov (United States)

    Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.

    2016-07-01

    We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).

  7. A Satellite Beam Planning Method Based on Clustering%一种基于聚类的卫星波束规划方法

    Institute of Scientific and Technical Information of China (English)

    郝英川

    2014-01-01

    针对卫星通信系统点波束的可移动特点,为了提高卫星资源的利用率,将聚类算法引入到波束规划中。通过对地面节点的业务区域统计,动态调整波束的覆盖规划,保证了卫星资源的利用效率及系统的通信容量和服务质量。对算法进行了典型场景的仿真,验证了算法的可行性和高效性。%Satellite communication systems with mobile spot beams are able to adjust beam direction to cover area on earth according to the distribution of clients and service.To improve the efficiency of beams,clustering theory is introduced to deal with this problem.The area to be covered is first clustered into several candidate clusters according to the statistics of client distribution and their throughput requirement,and then the beam assignment is processed by associating candidate areas with specific beams.The efficiency of satellite resource,as well as throughput and QoS of system,lies directly on the assignment of beams.Its feasibility and efficiency are veri-fied by simulations.

  8. Cluster Monte Carlo methods for the FePt Hamiltonian

    Science.gov (United States)

    Lyberatos, A.; Parker, G. J.

    2016-02-01

    Cluster Monte Carlo methods for the classical spin Hamiltonian of FePt with long range exchange interactions are presented. We use a combination of the Swendsen-Wang (or Wolff) and Metropolis algorithms that satisfies the detailed balance condition and ergodicity. The algorithms are tested by calculating the temperature dependence of the magnetization, susceptibility and heat capacity of L10-FePt nanoparticles in a range including the critical region. The cluster models yield numerical results in good agreement within statistical error with the standard single-spin flipping Monte Carlo method. The variation of the spin autocorrelation time with grain size is used to deduce the dynamic exponent of the algorithms. Our cluster models do not provide a more accurate estimate of the magnetic properties at equilibrium.

  9. A two-stage method for microcalcification cluster segmentation in mammography by deformable models

    Energy Technology Data Exchange (ETDEWEB)

    Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)

    2015-10-15

    Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing

  10. Dynamic access clustering selecting mechanism based on Markov decision process for MANET

    Institute of Scientific and Technical Information of China (English)

    WANG Dao-yuan; TIAN Hui

    2007-01-01

    Clustering is an important method in the mobile Ad-hoc network (MANET). As a result of their mobility, the cluster selection is inevitable for the mobile nodes during their roaming between the different clusters. In this study, based on the analysis of the cluster-selecting problem in the environment containing multiple clusters, which are overlaying and intercrossing, a novel dynamic selecting mechanism is proposed to resolve the dynamic selection optimization of roaming between the different clusters in MANET. This selecting mechanism is also based on the consideration of the stability of communication system, the communicating bandwidth, and the effect of cluster selecting on the communication and also in accordance with the Markov decision-making model.

  11. 一种增长型自组织特征映射文本聚类方法%A Growing Self-organizing Feature Map Text-based Clustering Method

    Institute of Scientific and Technical Information of China (English)

    张颖超; 李继扬

    2012-01-01

    To build a harmonious and civilized Internet environment, poor text messages on the network to enhance the recognition and response capabilities. Article uses a novel method based on growing self-organizing feature map (GSOFM) and latent semantic indexing (LSI) method for performing a combination of text clustering.The combination of these two algorithms to find global and local features of the model. Experiments under the same conditions used in this new model and a single GSOFM and compared. Experimental results show that: The new combination of two technologies compared with the single GSOFM method improves the accuracy of clustering results, reducing the computation time for performing text clustering network provides a better way.%为建设和谐文明的网络环境,提升对网络不良文本信息的识别和应对能力.文章使用一种新颖的基于增长型自组织特征映射(GSOFM)和潜在语义索引(LSI)相结合方法用于不良文本聚类.这两种算法的结合能够发现全局和局部的模式特点.实验在相同的条件下使用了这种新颖的模式并和单一的GSOFM相比较.实验结果证明:这种新的两种技术的结合与单一的GSOFM方法相比提高了聚类结果的精确性,缩短了计算时问,为网络不良文本聚类提供了一种较好的方法.

  12. BioCluster:Tool for Identification and Clustering of Enterobacteriaceae Based on Biochemical Data

    Institute of Scientific and Technical Information of China (English)

    Ahmed Abdullah; S.M.Sabbir Alam; Munawar Sultana; M.Anwar Hossain

    2015-01-01

    Presumptive identification of different Enterobacteriaceae species is routinely achieved based on biochemical properties. Traditional practice includes manual comparison of each biochem-ical property of the unknown sample with known reference samples and inference of its identity based on the maximum similarity pattern with the known samples. This process is labor-intensive, time-consuming, error-prone, and subjective. Therefore, automation of sorting and sim-ilarity in calculation would be advantageous. Here we present a MATLAB-based graphical user interface (GUI) tool named BioCluster. This tool was designed for automated clustering and iden-tification of Enterobacteriaceae based on biochemical test results. In this tool, we used two types of algorithms, i.e., traditional hierarchical clustering (HC) and the Improved Hierarchical Clustering (IHC), a modified algorithm that was developed specifically for the clustering and identification of Enterobacteriaceae species. IHC takes into account the variability in result of 1–47 biochemical tests within this Enterobacteriaceae family. This tool also provides different options to optimize the clus-tering in a user-friendly way. Using computer-generated synthetic data and some real data, we have demonstrated that BioCluster has high accuracy in clustering and identifying enterobacterial species based on biochemical test data. This tool can be freely downloaded at http://microbialgen.du.ac.bd/biocluster/.

  13. Fuzzy support vector machines based on linear clustering

    Science.gov (United States)

    Xiong, Shengwu; Liu, Hongbing; Niu, Xiaoxiao

    2005-10-01

    A new Fuzzy Support Vector Machines (FSVMs) based on linear clustering is proposed in this paper. Its concept comes from the idea of linear clustering, selecting the data points near to the preformed hyperplane, which is formed on the training set including one positive and one negative training samples respectively. The more important samples near to the preformed hyperplane are selected by linear clustering technique, and the new FSVMs are formed on the more important data set. It integrates the merit of two kinds of FSVMs. The membership functions are defined using the relative distance between the data points and the preformed hyperplane during the training process. The fuzzy membership decision functions of multi-class FSVMs adopt the minimal value of all the decision functions of two-class FSVMs. To demonstrate the superiority of our methods, the benchmark data sets of machines learning databases are selected to verify the proposed FSVMs. The experimental results indicate that the proposed FSVMs can reduce the training data and running time, and its recognition rate is greater than or equal to that of FSVMs through selecting a suitable linear clustering parameter.

  14. A COMPARATIVE STUDY TO FIND A SUITABLE METHOD FOR TEXT DOCUMENT CLUSTERING

    Directory of Open Access Journals (Sweden)

    Dr.M.Punithavalli

    2012-01-01

    Full Text Available Text mining is used in various text related tasks such as information extraction, concept/entity extraction,document summarization, entity relation modeling (i.e., learning relations between named entities,categorization/classification and clustering. This paper focuses on document clustering, a field of textmining, which groups a set of documents into a list of meaningful categories. The main focus of thispaper is to present a performance analysis of various techniques available for document clustering. Theresults of this comparative study can be used to improve existing text data mining frameworks andimprove the way of knowledge discovery. This paper considers six clustering techniques for documentclustering. The techniques are grouped into three groups namely Group 1 - K-means and its variants(traditional K-means and K* Means algorithms, Group 2 - Expectation Maximization and its variants(traditional EM, Spherical Gaussian EM algorithm and Linear Partitioning and Reallocation clustering(LPR using EM algorithms, Group 3 - Semantic-based techniques (Hybrid method and Feature-basedalgorithms. A total of seven algorithms are considered and were selected based on their popularity inthe text mining field. Several experiments were conducted to analyze the performance of the algorithmand to select the winner in terms of cluster purity, clustering accuracy and speed of clustering.

  15. Non-hierarchical clustering methods on factorial subspaces

    OpenAIRE

    Tortora, Cristina

    2011-01-01

    Cluster analysis (CA) aims at finding homogeneous group of individuals, where homogeneous is referred to individuals that present similar characteristics. Many CA techniques already exist, among the non-hierarchical ones the most known, thank to its simplicity and computational property, is k-means method. However, the method is unstable when the number of variables is large and when variables are correlated. This problem leads to the development of two-step methods, they perform a linear tra...

  16. Clustering Analysis on E-commerce Transaction Based on K-means Clustering

    Directory of Open Access Journals (Sweden)

    Xuan HUANG

    2014-02-01

    Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.

  17. Clustering-based redshift estimation: application to VIPERS/CFHTLS

    Science.gov (United States)

    Scottez, V.; Mellier, Y.; Granett, B. R.; Moutard, T.; Kilbinger, M.; Scodeggio, M.; Garilli, B.; Bolzonella, M.; de la Torre, S.; Guzzo, L.; Abbas, U.; Adami, C.; Arnouts, S.; Bottini, D.; Branchini, E.; Cappi, A.; Cucciati, O.; Davidzon, I.; Fritz, A.; Franzetti, P.; Iovino, A.; Krywult, J.; Le Brun, V.; Le Fèvre, O.; Maccagni, D.; Małek, K.; Marulli, F.; Polletta, M.; Pollo, A.; Tasca, L. A. M.; Tojeiro, R.; Vergani, D.; Zanichelli, A.; Bel, J.; Coupon, J.; De Lucia, G.; Ilbert, O.; McCracken, H. J.; Moscardini, L.

    2016-10-01

    We explore the accuracy of the clustering-based redshift estimation proposed by Ménard et al. when applied to VIMOS Public Extragalactic Redshift Survey (VIPERS) and Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) real data. This method enables us to reconstruct redshift distributions from measurement of the angular clustering of objects using a set of secure spectroscopic redshifts. We use state-of-the-art spectroscopic measurements with iAB 0.5 which allows us to test the accuracy of the clustering-based redshift distributions. We show that this method enables us to reproduce the true mean colour-redshift relation when both populations have the same magnitude limit. We also show that this technique allows the inference of redshift distributions for a population fainter than the reference and we give an estimate of the colour-redshift mapping in this case. This last point is of great interest for future large-redshift surveys which require a complete faint spectroscopic sample.

  18. Covariance analysis of differential drag-based satellite cluster flight

    Science.gov (United States)

    Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini

    2016-06-01

    One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.

  19. Green Clustering Implementation Based on DPS-MOPSO

    Directory of Open Access Journals (Sweden)

    Yang Lu

    2014-01-01

    Full Text Available A green clustering implementation is proposed to be as the first method in the framework of an energy-efficient strategy for centralized enterprise high-density WLANs. Traditionally, to maintain the network coverage, all of the APs within the WLAN have to be powered on. Nevertheless, the new algorithm can power off a large proportion of APs while the coverage is maintained as the always-on counterpart. The proposed algorithm is composed of two parallel and concurrent procedures, which are the faster procedure based on K-means and the more accurate procedure based on Dynamic Population Size Multiple Objective Particle Swarm Optimization (DPS-MOPSO. To implement green clustering efficiently and accurately, dynamic population size and mutational operators are introduced as complements for the classical MOPSO. In addition to the function of AP selection, the new green clustering algorithm has another new function as the reference and guidance for AP deployment. This paper also presents simulations in scenarios modeled with ray-tracing method and FDTD technique, and the results show that about 67% up to 90% of energy consumption can be saved while the original network coverage is maintained during periods when few users are online or when the traffic load is low.

  20. 基于遗传算法的K均值聚类分析%K-Means Clustering Based on Genetic Algorithm

    Institute of Scientific and Technical Information of China (English)

    王敞; 陈增强; 袁著祉

    2003-01-01

    This paper proposes a K-Means clustering method based on genetic algorithm. We compare our method with the traditional K-Means method and clustering method based on simple genetic algorithm. The comparison proves that our method achieves a better result than the other two. The drawback of this method is a comparably slower speed in clustering.

  1. Quantum Monte Carlo methods and lithium cluster properties

    Energy Technology Data Exchange (ETDEWEB)

    Owen, R.K.

    1990-12-01

    Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) [0.1981], 0.1895(9) [0.1874(4)], 0.1530(34) [0.1599(73)], 0.1664(37) [0.1724(110)], 0.1613(43) [0.1675(110)] Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) [0.0203(12)], 0.0188(10) [0.0220(21)], 0.0247(8) [0.0310(12)], 0.0253(8) [0.0351(8)] Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.

  2. Clustering in mobile ad hoc network based on neural network

    Institute of Scientific and Technical Information of China (English)

    CHEN Ai-bin; CAI Zi-xing; HU De-wen

    2006-01-01

    An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clustering algorithm, then a training set was created and a neural network was trained. In this algorithm, several system parameters were taken into account, such as the ideal node-degree, the transmission power, the mobility and the battery power of the nodes. The algorithm can be used directly to test whether a node is a cluster-head or not. Moreover, the clusters recreation can be speeded up.

  3. Research and Implementation of Unsupervised Clustering-Based Intrusion Detection

    Institute of Scientific and Technical Information of China (English)

    Luo Min; Zhang Huan-guo; Wang Li-na

    2003-01-01

    An unsupervised clustering-based intrusion de tection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesnt need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.

  4. Method for discovering relationships in data by dynamic quantum clustering

    Energy Technology Data Exchange (ETDEWEB)

    Weinstein, Marvin; Horn, David

    2014-10-28

    Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.

  5. Clustering Methods with Qualitative Data: a Mixed-Methods Approach for Prevention Research with Small Samples.

    Science.gov (United States)

    Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G

    2015-10-01

    Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities.

  6. Fast spectral color image segmentation based on filtering and clustering

    Science.gov (United States)

    Xing, Min; Li, Hongyu; Jia, Jinyuan; Parkkinen, Jussi

    2009-10-01

    This paper proposes a fast approach to spectral image segmentation. In the algorithm, two popular techniques are extended and applied to spectral color images: the mean-shift filtering and the kernel-based clustering. We claim that segmentation should be completed under illuminant F11 rather than directly using the original spectral reflectance, because such illumination can reduce data variability and expedite the following filtering. The modes obtained in the mean-shift filtering represent the local features of spectral images, and will be applied to segmentation in place of pixels. Since the modes are generally small in number, the eigendecomposition of kernel matrices, the crucial step in the kernelbased clustering, becomes much easier. The combination of these two techniques can efficiently enhance the performance of segmentation. Experiments show that the proposed segmentation method is feasible and very promising for spectral color images.

  7. Analysis of protein profiles using fuzzy clustering methods

    DEFF Research Database (Denmark)

    Karemore, Gopal Raghunath; Ukendt, Sujatha; Rai, Lavanya

    clustering methods for their classification followed by various validation  measures.    The  clustering  algorithms  used  for  the  study  were  K-  means,  K- medoid, Fuzzy C-means, Gustafson-Kessel, and Gath-Geva.  The results presented in this study  conclude  that  the  protein  profiles  of  tissue......  samples  recorded  by  using  the  HPLC- LIF  system  and  the  data  analyzed  by  clustering  algorithms  quite  successfully  classifies them as belonging from normal and malignant conditions....

  8. 基于子镜头聚类方法的关键帧提取技术%Method of Key Frame Extraction Based on Sub-Shot Clustering

    Institute of Scientific and Technical Information of China (English)

    罗森林; 马舒洁; 梁静; 潘丽敏; 冯杨

    2011-01-01

    By analyzing the known techniques of frame extraction, a new method of key frame extraction (KFE) is proposed based on sub-shot clustering in this paper. Using the features of color histogram between successive frames, the key frames could be extracted via sub-shot detection and clustering after the relocation of the shot boundary. Experimental results show that, the proposed method is adaptable, accurate, and effective. The improvement of key frame extraction lies in the fact that it not only reduces the complexity of extracting process, but also avoids redundancies of key frames effectively.%分析主流的关键帧提取技术,提出了一种基于子镜头聚类的关键帧提取算法.该方法在重新定位镜头的起始和终止帧号后,利用帧与帧之间的颜色直方图特征,通过子镜头检测和聚类进行关键帧提取.实验结果表明,该算法具有良好的适应性,既降低了关键帧提取算法的计算复杂度,正确率高,同时能有效避免关键帧的冗余,达到了很好的关键帧提取效果.

  9. Reliability analysis of cluster-based ad-hoc networks

    Energy Technology Data Exchange (ETDEWEB)

    Cook, Jason L. [Quality Engineering and System Assurance, Armament Research Development Engineering Center, Picatinny Arsenal, NJ (United States); Ramirez-Marquez, Jose Emmanuel [School of Systems and Enterprises, Stevens Institute of Technology, Castle Point on Hudson, Hoboken, NJ 07030 (United States)], E-mail: Jose.Ramirez-Marquez@stevens.edu

    2008-10-15

    The mobile ad-hoc wireless network (MAWN) is a new and emerging network scheme that is being employed in a variety of applications. The MAWN varies from traditional networks because it is a self-forming and dynamic network. The MAWN is free of infrastructure and, as such, only the mobile nodes comprise the network. Pairs of nodes communicate either directly or through other nodes. To do so, each node acts, in turn, as a source, destination, and relay of messages. The virtue of a MAWN is the flexibility this provides; however, the challenge for reliability analyses is also brought about by this unique feature. The variability and volatility of the MAWN configuration makes typical reliability methods (e.g. reliability block diagram) inappropriate because no single structure or configuration represents all manifestations of a MAWN. For this reason, new methods are being developed to analyze the reliability of this new networking technology. New published methods adapt to this feature by treating the configuration probabilistically or by inclusion of embedded mobility models. This paper joins both methods together and expands upon these works by modifying the problem formulation to address the reliability analysis of a cluster-based MAWN. The cluster-based MAWN is deployed in applications with constraints on networking resources such as bandwidth and energy. This paper presents the problem's formulation, a discussion of applicable reliability metrics for the MAWN, and illustration of a Monte Carlo simulation method through the analysis of several example networks.

  10. Histological image segmentation using fast mean shift clustering method

    OpenAIRE

    Wu, Geming; Zhao, Xinyan; Luo, Shuqian; Shi, Hongli

    2015-01-01

    Background Colour image segmentation is fundamental and critical for quantitative histological image analysis. The complexity of the microstructure and the approach to make histological images results in variable staining and illumination variations. And ultra-high resolution of histological images makes it is hard for image segmentation methods to achieve high-quality segmentation results and low computation cost at the same time. Methods Mean Shift clustering approach is employed for histol...

  11. 基于聚类的多属性群决策专家权重确定方法%A Method for Determining the Experts’ Weights of Multi-Attribute Group Decision-Making Based on Clustering Analysis

    Institute of Scientific and Technical Information of China (English)

    何立华; 王栎绮; 张连营

    2014-01-01

    对于多属性群决策中专家权重确定的问题,本文提出了基于聚类的专家权重确定方法,将专家权重分为类别间权重和类别内权重,对专家聚类步骤和类别间权重的计算方法进行了改进。通过专家给出的判断矩阵构建相容度矩阵,利用系统聚类原理,对相容度矩阵进行聚类,得到最大相容度谱系图。通过最大相容度间的距离和给定阈值的比较,对专家进行恰当分类,从而避免了根据现有研究步骤只能将专家分为两类的不足。此外,在确定类别间权重时,除继续对类容量较大的类赋予较大的类别间权重系数外,还引入专家判断矩阵的属性权重一致性来反映类别间的差异,从而有效避免了当某几类专家中含有相等数目专家时,赋予这几类专家相同类别间权重系数的问题。所提方法结构清晰、计算简便,并使得专家权重计算结果更为合理准确。最后运用一个算例对比验证了该方法的可行性和有效性。%An experts’ weight determining method based on the experts ’ weights clustering analysis is proposed to determine the experts’ weights of multi-attribute group decision-making.The experts’ weight is divided into the weights between categories and within category .The steps of experts ’ clustering and the calculation method of the weights between categories are improved .The clustering pedigree chart of the maximum compatibility de-gree is got by building the expert judgment compatibility matrix according to the expert judgment matrix , making use of the system clustering principle to cluster the compatibility degree matrix .The experts are classified proper-ly according to the comparison of the distance between the maximum compatibility degree and the given threshold value, which overcomes the shortcoming of only clustering the experts into two categories in existing literatures . In addition, while determining the

  12. Comparison of Bayesian clustering and edge detection methods for inferring boundaries in landscape genetics

    Science.gov (United States)

    Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.

    2011-01-01

    Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.

  13. A Load Balance Routing Algorithm Based on Uneven Clustering

    Directory of Open Access Journals (Sweden)

    Liang Yuan

    2013-10-01

    Full Text Available Aiming at the problem of uneven load in clustering Wireless Sensor Network (WSN, a kind of load balance routing algorithm based on uneven clustering is proposed to do uneven clustering and calculate optimal number of clustering. This algorithm prevents the number of common node under some certain cluster head from being too large which leads load to be overweight to death through even node clustering. It constructs evaluation function which can better reflect residual energy distribution of nodes and at the same time constructs routing evaluation function between cluster heads which uses MATLAB to do simulation on the performance of this algorithm. Simulation result shows that the routing established by this algorithm effectively improves network’s energy balance and lengthens the life cycle of network.  

  14. An adaptive spatial clustering method for automatic brain MR image segmentation

    Institute of Scientific and Technical Information of China (English)

    Jingdan Zhang; Daoqing Dai

    2009-01-01

    In this paper, an adaptive spatial clustering method is presented for automatic brain MR image segmentation, which is based on a competitive learning algorithm-self-organizing map (SOM). We use a pattern recognition approach in terms of feature generation and classifier design. Firstly, a multi-dimensional feature vector is constructed using local spatial information. Then, an adaptive spatial growing hierarchical SOM (ASGHSOM) is proposed as the classifier, which is an extension of SOM, fusing multi-scale segmentation with the competitive learning clustering algorithm to overcome the problem of overlapping grey-scale intensities on boundary regions. Furthermore, an adaptive spatial distance is integrated with ASGHSOM, in which local spatial information is considered in the cluster-ing process to reduce the noise effect and the classification ambiguity. Our proposed method is validated by extensive experiments using both simulated and real MR data with varying noise level, and is compared with the state-of-the-art algorithms.

  15. Blind signal separation of underdetermined mixtures based on clustering algorithms on planes

    Institute of Scientific and Technical Information of China (English)

    Xie Shengli; Tan Beihai; Fu Yuli

    2007-01-01

    Based on clustering method on planes, blind signal separation (BSS) of underdetermined mixtures with three observed signals is discussed. The condition of sufficient sparsity of the source signals is not necessary when clustering method on planes is used. In other words, it needs not that only one source signal plays the main role among others at one time. The proposed method uses normal line clustering of planes first. Then, the mixing matrix can be identified via deciding the intersection lines of the planes. This method is an effective implement of the new theory presented by Georgiev. Simulations illustrate accuracy and restoring capability of the method to estimate the mixing matrix.

  16. Multi-face detection based on downsampling and modified subtractive clustering for color images

    Institute of Scientific and Technical Information of China (English)

    KONG Wan-zeng; ZHU Shan-an

    2007-01-01

    This paper presents a multi-face detection method for color images. The method is based on the assumption that faces are well separated from the background by skin color detection. These faces can be located by the proposed method which modifies the subtractive clustering. The modified clustering algorithm proposes a new definition of distance for multi-face detection, and its key parameters can be predetermined adaptively by statistical information of face objects in the image. Downsampling is employed to reduce the computation of clustering and speed up the process of the proposed method. The effectiveness of the proposed method is illustrated by three experiments.

  17. Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels

    Science.gov (United States)

    Moghaddasi, Hanieh; Khalifeh, Khosrow; Darooneh, Amir Hossein

    2017-01-01

    Functional DNA sub-sequences and genome elements are spatially clustered through the genome just as keywords in literary texts. Therefore, some of the methods for ranking words in texts can also be used to compare different DNA sub-sequences. In analogy with the literary texts, here we claim that the distribution of distances between the successive sub-sequences (words) is q-exponential which is the distribution function in non-extensive statistical mechanics. Thus the q-parameter can be used as a measure of words clustering levels. Here, we analyzed the distribution of distances between consecutive occurrences of 16 possible dinucleotides in human chromosomes to obtain their corresponding q-parameters. We found that CG as a biologically important two-letter word concerning its methylation, has the highest clustering level. This finding shows the predicting ability of the method in biology. We also proposed that chromosome 18 with the largest value of q-parameter for promoters of genes is more sensitive to dietary and lifestyle. We extended our study to compare the genome of some selected organisms and concluded that the clustering level of CGs increases in higher evolutionary organisms compared to lower ones. PMID:28128320

  18. Distinguishing Functional DNA Words; A Method for Measuring Clustering Levels

    Science.gov (United States)

    Moghaddasi, Hanieh; Khalifeh, Khosrow; Darooneh, Amir Hossein

    2017-01-01

    Functional DNA sub-sequences and genome elements are spatially clustered through the genome just as keywords in literary texts. Therefore, some of the methods for ranking words in texts can also be used to compare different DNA sub-sequences. In analogy with the literary texts, here we claim that the distribution of distances between the successive sub-sequences (words) is q-exponential which is the distribution function in non-extensive statistical mechanics. Thus the q-parameter can be used as a measure of words clustering levels. Here, we analyzed the distribution of distances between consecutive occurrences of 16 possible dinucleotides in human chromosomes to obtain their corresponding q-parameters. We found that CG as a biologically important two-letter word concerning its methylation, has the highest clustering level. This finding shows the predicting ability of the method in biology. We also proposed that chromosome 18 with the largest value of q-parameter for promoters of genes is more sensitive to dietary and lifestyle. We extended our study to compare the genome of some selected organisms and concluded that the clustering level of CGs increases in higher evolutionary organisms compared to lower ones.

  19. Cluster detection of diseases in heterogeneous populations: an alternative to scan methods

    Directory of Open Access Journals (Sweden)

    Rebeca Ramis

    2014-05-01

    Full Text Available Cluster detection has become an important part of the agenda of epidemiologists and public health authorities, the identification of high- and low-risk areas is fundamental in the definition of public health strategies and in the suggestion of potential risks factors. Currently, there are different cluster detection techniques available, the most popular being those using windows to scan the areas within the studied region. However, when these areas are heterogeneous in populations’ sizes, scan window methods can lead to inaccurate conclusions. In order to perform cluster detection over heterogeneously populated areas, we developed a method not based on scanning windows but instead on standard mortality ratios (SMR using irregular spatial aggregation (ISA. Its extension, i.e. irregular spatial aggregation with covariates (ISAC, includes covariates with residuals from Poisson regression. We compared the performance of the method with the flexible shaped spatial scan statistic (FlexScan using mortality data for stomach and bladder cancer for 8,098 Spanish towns. The results show a collection of clusters for stomach and bladder cancer similar to that detected by ISA and FlexScan. However, in general, clusters detected by FlexScan were bigger and include towns with SMR, which were not statistically significant. For bladder cancer, clusters detected by ISAC differed from those detected by ISA and FlexScan in shape and location. The ISA and ISAC methods could be an alternative to the traditional scan window methods for cluster detection over aggregated data when the areas under study are heterogeneous in terms of population. The simplicity and flexibility of the methods make them more attractive than methods based on more complicated algorithms.

  20. An Energy Efficient and Secure Clustering Protocol for Military based WSN

    Directory of Open Access Journals (Sweden)

    Prachi

    2017-01-01

    Full Text Available Less contiguous nature of military applications demands for surveillance of widespread areas that are indeed harder to monitor. Unlike traditional Wireless Sensor Networks (WSNs, a military based large size sensor network possesses unique requirements/challenges in terms of self-configuration, coverage, connectivity and energy dissipation. Taking this aspect into consideration, this paper proposes a novel, efficient and secure clustering method for military based applications. In any clustering based approach, one of the prime concerns is appropriate selection of Cluster Heads and formation of balanced clusters. This paper proposes and analyzes two schemes, Average Energy based Clustering (AEC and Threshold Energy based Clustering (TEC. In AEC, a node is elected as Cluster Head (CH if its residual energy is above the average energy of its cluster whereas in case of TEC, a node is elected as Cluster Head if its residual energy is above the threshold energy. Further, both AEC and TEC choose nodes as CHs if their distance lies within safety zone of the Base Station. In this paper, aim is to come up with a solution that not only conserves energy but balance load while electing safe nodes as CHs. The performance of proposed protocols was critically evaluated in terms of network lifetime, average residual energy of nodes and uniformity in energy dissipation of nodes. Results clearly demonstrated that AEC is successful in incorporating security whilst increasing overall lifetime of network, load balance and uniform energy dissipation.

  1. PHC: A Fast Partition and Hierarchy-Based Clustering Algorithm

    Institute of Scientific and Technical Information of China (English)

    ZHOU HaoFeng(周皓峰); YUAN QingQing(袁晴晴); CHENG ZunPing(程尊平); SHI BaiLe(施伯乐)

    2003-01-01

    Cluster analysis is a process to classify data in a specified data set. In this field,much attention is paid to high-efficiency clustering algorithms. In this paper, the features in thecurrent partition-based and hierarchy-based algorithms are reviewed, and a new hierarchy-basedalgorithm PHC is proposed by combining advantages of both algorithms, which uses the cohesionand the closeness to amalgamate the clusters. Compared with similar algorithms, the performanceof PHC is improved, and the quality of clustering is guaranteed. And both the features were provedby the theoretic and experimental analyses in the paper.

  2. Monomer Basis Representation Method For Calculating The Spectra Of Molecular Clusters I. The Method And Qualitative Models

    CERN Document Server

    Ocak, Mahir E

    2012-01-01

    Firstly, a sequential symmetry adaptation procedure is derived for semidirect product groups. Then, this sequential symmetry adaptation procedure is used in the development of new method named Monomer Basis Representation (MBR) for calculating the vibration-rotation-tunneling (VRT) spectra of molecular clusters. The method is based on generation of optimized bases for each monomer in the cluster as a linear combination of some primitive basis functions and then using the sequential symmetry adaptation procedure for generating a small symmetry adapted basis for the solution of the full problem. It is seen that given an optimized basis for each monomer the application of the sequential symmetry adaptation procedure leads to a generalized eigenvalue problem instead of a standard eigenvalue problem if the procedure is used as it is. In this paper, MBR method will be developed as a solution of that problem such that it leads to generation of an orthogonal optimized basis for the cluster being studied regardless of...

  3. Scalable Integrated Region-Based Image Retrieval Using IRM and Statistical Clustering.

    Science.gov (United States)

    Wang, James Z.; Du, Yanping

    Statistical clustering is critical in designing scalable image retrieval systems. This paper presents a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images…

  4. A Hierarchical Clustering Method Based on the Threshold of Semantic Feature in Big Data%大数据中一种基于语义特征阈值的层次聚类方法

    Institute of Scientific and Technical Information of China (English)

    罗恩韬; 王国军

    2015-01-01

    云计算、健康医疗、街景地图服务、推荐系统等新兴服务促使数据的种类和规模以前所未有的速度增长,数据量的激增会导致很多共性问题.例如数据的可表示,可处理和可靠性问题.如何有效处理和分析数据之间的关系,提高数据的划分效率,建立数据的聚类分析模型,已经成为学术界和企业界共同亟待解决的问题.该文提出一种基于语义特征的层次聚类方法,首先根据数据的语义特征进行训练,然后在每个子集上利用训练结果进行层次聚类,最终产生整体数据的密度中心点,提高了数据聚类效率和准确性.此方法采样复杂度低,数据分析准确,易于实现,具有良好的判定性.%The type and scale of data has been promoted with a hitherto unknown speed by the emerging services including cloud computing, health care, street view services recommendation system and so on. However, the surge in the volume of data may lead to many common problems, such as the representability, reliability and handlability of data. Therefore, how to effectively handle the relationship between the data and the analysis to improve the efficiency of classification of the data and establish the data clustering analysis model has become an academic and business problem, which needs to be solved urgently. A hierarchical clustering method based on semantic feature is proposed. Firstly, the data should be trained according to the semantic features of data, and then is used the training result to process hierarchical clustering in each subset; finally, the density center point is produced. This method can improve the efficiency and accuracy of data clustering. This algorithm is of low complexity about sampling, high accuracy of data analysis and good judgment. Furthermore, the algorithm is easy to realize.

  5. The Method of Data Aggregation for Wireless Sensor Network Based on Cluster Compressed Sensing of Multi-Sparsity Basis%多稀疏基分簇压缩感知的WSN数据融合方法

    Institute of Scientific and Technical Information of China (English)

    朱路; 刘媛媛; 慈白山; 潘泽中

    2016-01-01

    A novel data fusion method for WSN(Wireless Sensor Network)based on cluster compressed sensing (CCS)of multi-sparsity basis is presented to solve the contradiction between data accuracy collected and energy consumption in sensor nodes. In the proposed method,the improved threshold is adopted to select cluster head and form optimization cluster from the random deployment of sensor nodes,and the Bernoulli random matrix is utilized to linearly compress sensor data in the cluster by every cluster head,then the compressed information is transmitted to the sink,so it reduces data transmission and energy consumption of communication,thus improving the lifetime of network. According to monitor signals being of sparsity in finite difference and wavelets,the sink uses OOMP al⁃gorithm to reconstruct linear compression projection information from the finite difference and wavelets sparsity ba⁃sis respectively. And the least square method is adopted to get together the two different reconstruction signals which can improve data accuracy. Simulation experiment results show that the data fusion method of WSN based on CCS of multi-sparsity basis can guarantee data accuracy collected,and improve the lifetime of whole network at the same time,to solve the contradiction between data accuracy collected and network lifetime.%针对传感器节点采集数据精度与能量消耗的矛盾,提出多稀疏基分簇压缩感知的无线传感器网络WSN(Wireless Sensor Network)数据融合方法。该方法利用改进的阈值对随机部署的传感器节点进行簇首选择继而形成最优簇,簇首采用伯努利随机观测矩阵对簇内节点信号进行线性压缩投影,然后将压缩的信息传送给汇聚节点,减少数据传输即降低通信能耗,从而提高网络的生命周期。根据传感器节点监测信号在有限差分和小波中都具有可压缩特性,汇聚节点在有限差分和小波两个稀疏基的约束下,利用OOMP算法

  6. Data relationship degree-based clustering data aggregation for VANET

    Science.gov (United States)

    Kumar, Rakesh; Dave, Mayank

    2016-03-01

    Data aggregation is one of the major needs of vehicular ad hoc networks (VANETs) due to the constraints of resources. Data aggregation in VANET can reduce the data redundancy in the process of data gathering and thus conserving the bandwidth. In realistic applications, it is always important to construct an effective route strategy that optimises not only communication cost but also the aggregation cost. Data aggregation at the cluster head by individual vehicle causes flooding of the data, which results in maximum latency and bandwidth consumption. Another approach of data aggregation in VANET is sending local representative data based on spatial correlation of sampled data. In this article, we emphasise on the problem that recent spatial correlation data models of vehicles in VANET are not appropriate for measuring the correlation in a complex and composite environment. Moreover, the data represented by these models is generally inaccurate when compared to the real data. To minimise this problem, we propose a group-based data aggregation method that uses data relationship degree (DRD). In the proposed approach, DRD is a spatial relationship measurement parameter that measures the correlation between a vehicle's data and its neighbouring vehicles' data. The DRD clustering method where grouping of vehicle's data is done based on the available data and its correlation is presented in detail. Results prove that the representative data using proposed approach have a low distortion and provides an improvement in packet delivery ratio and throughput (up to of 10.84% and 24.82% respectively) as compared to the other state-of-the-art solutions like Cluster-Based Accurate Syntactic Compression of Aggregated Data in VANETs.

  7. Automatic Clustering Approaches Based On Initial Seed Points

    Directory of Open Access Journals (Sweden)

    G.V.S.N.R.V.Prasad

    2011-12-01

    Full Text Available Since clustering is applied in many fields, a number of clustering techniques and algorithms have been proposed and are available in the literature. This paper proposes a novel approach to address the major problems in any of the partitional clustering algorithms like choosing appropriate K-value and selection of K-initial seed points. The performance of any partitional clustering algorithms depends oninitial seed points which are random in all the existing partitional clustering algorithms. To overcome this problem, a novel algorithm called Weighted Interior Clustering (WIC algorithm to find approximate initial seed-points, number of clusters and data points in the clusters is proposed in this paper. This paper also proposes another novel approach combining a newly proposed WIC algorithm with K-means named as Weighted Interior K-means Clustering (WIKC. The novelty of this WIKC is that it improves the quality and performance of K-means clustering algorithm with reduced complexity. The experimental results on various datasets, with various instances clearly indicates the efficacy of the proposed methods over the other methods.

  8. Cloud Computing Application for Hotspot Clustering Using Recursive Density Based Clustering (RDBC)

    Science.gov (United States)

    Santoso, Aries; Khiyarin Nisa, Karlina

    2016-01-01

    Indonesia has vast areas of tropical forest, but are often burned which causes extensive damage to property and human life. Monitoring hotspots can be one of the forest fire management. Each hotspot is recorded in dataset so that it can be processed and analyzed. This research aims to build a cloud computing application which visualizes hotspots clustering. This application uses the R programming language with Shiny web framework and implements Recursive Density Based Clustering (RDBC) algorithm. Clustering is done on hotspot dataset of the Kalimantan Island and South Sumatra Province to find the spread pattern of hotspots. The clustering results are evaluated using the Silhouette's Coefficient (SC) which yield best value 0.3220798 for Kalimantan dataset. Clustering pattern are displayed in the form of web pages so that it can be widely accessed and become the reference for fire occurrence prediction.

  9. Use of multiple cluster analysis methods to explore the validity of a community outcomes concept map.

    Science.gov (United States)

    Orsi, Rebecca

    2017-02-01

    Concept mapping is now a commonly-used technique for articulating and evaluating programmatic outcomes. However, research regarding validity of knowledge and outcomes produced with concept mapping is sparse. The current study describes quantitative validity analyses using a concept mapping dataset. We sought to increase the validity of concept mapping evaluation results by running multiple cluster analysis methods and then using several metrics to choose from among solutions. We present four different clustering methods based on analyses using the R statistical software package: partitioning around medoids (PAM), fuzzy analysis (FANNY), agglomerative nesting (AGNES) and divisive analysis (DIANA). We then used the Dunn and Davies-Bouldin indices to assist in choosing a valid cluster solution for a concept mapping outcomes evaluation. We conclude that the validity of the outcomes map is high, based on the analyses described. Finally, we discuss areas for further concept mapping methods research.

  10. Remote sensing clustering analysis based on object-based interval modeling

    Science.gov (United States)

    He, Hui; Liang, Tianheng; Hu, Dan; Yu, Xianchuan

    2016-09-01

    In object-based clustering, image data are segmented into objects (groups of pixels) and then clustered based on the objects' features. This method can be used to automatically classify high-resolution, remote sensing images, but requires accurate descriptions of object features. In this paper, we ascertain that interval-valued data model is appropriate for describing clustering prototype features. With this in mind, we developed an object-based interval modeling method for high-resolution, multiband, remote sensing data. We also designed an adaptive interval-valued fuzzy clustering method. We ran experiments utilizing images from the SPOT-5 satellite sensor, for the Pearl River Delta region and Beijing. The results indicate that the proposed algorithm considers both the anisotropy of the remote sensing data and the ambiguity of objects. Additionally, we present a new dissimilarity measure for interval vectors, which better separates the interval vectors generated by features of the segmentation units (objects). This approach effectively limits classification errors caused by spectral mixing between classes. Compared with the object-based unsupervised classification method proposed earlier, the proposed algorithm improves the classification accuracy without increasing computational complexity.

  11. A new method for estimating insolation based on PV-module currents in a cluster of stand-alone solar systems

    NARCIS (Netherlands)

    Nieuwenhout, F; van der Borg, N; van Sark, W.G.J.H.M.; Turkenburg, W.C.

    2007-01-01

    In order to evaluate the performance of solar home systems (SHSs), data on local insolation is a prerequisite. We present a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV) module currents

  12. A new method for estimating insolation based on PV-module currents in a cluster of stand-alone solar systems

    Energy Technology Data Exchange (ETDEWEB)

    Nieuwenhout, F.; Van der Borg, N. [Energy Research Centre of the Netherlands, Petten (Netherlands); Van Sark, W.; Turkenburg, W. [Utrecht University (Netherlands). Copernicus Institute for Sustainable Development and Innovation, Department of Science, Technology and Society

    2006-07-01

    In order to evaluate the performance of solar home systems (SHSs), data on local insolation is a prerequisite. We present a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV) module currents from a number of SHSs, located a few kilometres apart. The method was tested with a 3-year time series for nine SHS in a remote area in Indonesia. Verification with reference cell measurements over a 2-month period showed that our method could determine average daily irradiation with a mean bias error of 1.3%. Daily irradiation figures showed a standard error of 5%. The systematic error in this method is estimated to be around 10%. Especially if calibration with measurements during a short period is possible, the proposed method provides more accurate monthly insolation figures compared with the readily available satellite data from the NASA SSE database. An advantage of the proposed method over satellite data is that irradiation figures can be calculated on a daily basis, while the SSE database only provides monthly averages. It is concluded that the new method is a valuable tool to obtain information on insolation when long-term measurements are absent. (author)

  13. Estimating insolation based on PV-module currents in a cluster of stand-alone solar systems: Introduction of a new method

    NARCIS (Netherlands)

    Nieuwenhout, F; van den Borg, N.; van Sark, W.G.J.H.M.; Turkenburg, W.C.

    2006-01-01

    In order to evaluate the performance of solar home systems (SHS), data on local insolation is a prerequisite. We present the outline of a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV)-m

  14. Quark-gluon plasma phase transition using cluster expansion method

    Science.gov (United States)

    Syam Kumar, A. M.; Prasanth, J. P.; Bannur, Vishnu M.

    2015-08-01

    This study investigates the phase transitions in QCD using Mayer's cluster expansion method. The inter quark potential is modified Cornell potential. The equation of state (EoS) is evaluated for a homogeneous system. The behaviour is studied by varying the temperature as well as the number of Charm Quarks. The results clearly show signs of phase transition from Hadrons to Quark-Gluon Plasma (QGP).

  15. Translationally-invariant coupled-cluster method for finite systems

    CERN Document Server

    Guardiola, R; Navarro, J; Portesi, M

    1998-01-01

    The translational invariant formulation of the coupled-cluster method is presented here at the complete SUB(2) level for a system of nucleons treated as bosons. The correlation amplitudes are solution of a non-linear coupled system of equations. These equations have been solved for light and medium systems, considering the central but still semi-realistic nucleon-nucleon S3 interaction.

  16. A Method for Group Decision-Making Based on Entropy Weight and Gray Cluster Analysis%基于熵权的群组灰色聚类决策法

    Institute of Scientific and Technical Information of China (English)

    蔡忠义; 陈云翔; 徐吉辉; 项华春

    2012-01-01

    In order to reasonably determine the weight of each expert in multi-attribute group decision-making, a method based on entropy weight and gray cluster analysis was proposed. According to the sequencing vectors obtained by normalization of each expert's corresponding judgment matrixes, cluster analysis was made with the absolute correlation matrix of gray system and the weights of inter-class were determined. The weights of within-class could be ascertained by the theory of entropy weight. A numerical example proved the feasibility and effectiveness of this method. The result showed that the method can effectively improve the rationality for weight determining and can contribute to scientific group decision-making.%在多属性群组决策方法的研究中,为了客观合理地确定群组专家的权值,提出一种基于熵权的群组灰色聚类决策方法.依据各个专家的判断矩阵归一化得到的排序向量,利用灰色绝对关联矩阵进行聚类分析并类间赋权,运用熵权理论进行类内赋权,结合算例验证了该方法可行有效.结果表明,该方法可以有效提高专家赋权的合理性和群组决策的科学性.

  17. An Efficient Semantic Model For Concept Based Clustering And Classification

    Directory of Open Access Journals (Sweden)

    SaiSindhu Bandaru

    2012-03-01

    Full Text Available Usually in text mining techniques the basic measures like term frequency of a term (word or phrase is computed to compute the importance of the term in the document. But with statistical analysis, the original semantics of the term may not carry the exact meaning of the term. To overcome this problem, a new framework has been introduced which relies on concept based model and synonym based approach. The proposed model can efficiently find significant matching and related concepts between documents according to concept based and synonym based approaches. Large sets of experiments using the proposed model on different set in clustering and classification are conducted. Experimental results demonstrate the substantialenhancement of the clustering quality using sentence based, document based, corpus based and combined approach concept analysis. A new similarity measure has been proposed to find the similarity between adocument and the existing clusters, which can be used in classification of the document with existing clusters.

  18. Semi-supervised weighted kernel clustering based on gravitational search for fault diagnosis.

    Science.gov (United States)

    Li, Chaoshun; Zhou, Jianzhong

    2014-09-01

    Supervised learning method, like support vector machine (SVM), has been widely applied in diagnosing known faults, however this kind of method fails to work correctly when new or unknown fault occurs. Traditional unsupervised kernel clustering can be used for unknown fault diagnosis, but it could not make use of the historical classification information to improve diagnosis accuracy. In this paper, a semi-supervised kernel clustering model is designed to diagnose known and unknown faults. At first, a novel semi-supervised weighted kernel clustering algorithm based on gravitational search (SWKC-GS) is proposed for clustering of dataset composed of labeled and unlabeled fault samples. The clustering model of SWKC-GS is defined based on wrong classification rate of labeled samples and fuzzy clustering index on the whole dataset. Gravitational search algorithm (GSA) is used to solve the clustering model, while centers of clusters, feature weights and parameter of kernel function are selected as optimization variables. And then, new fault samples are identified and diagnosed by calculating the weighted kernel distance between them and the fault cluster centers. If the fault samples are unknown, they will be added in historical dataset and the SWKC-GS is used to partition the mixed dataset and update the clustering results for diagnosing new fault. In experiments, the proposed method has been applied in fault diagnosis for rotatory bearing, while SWKC-GS has been compared not only with traditional clustering methods, but also with SVM and neural network, for known fault diagnosis. In addition, the proposed method has also been applied in unknown fault diagnosis. The results have shown effectiveness of the proposed method in achieving expected diagnosis accuracy for both known and unknown faults of rotatory bearing.

  19. A fast density-based clustering algorithm for real-time Internet of Things stream.

    Science.gov (United States)

    Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut

    2014-01-01

    Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  20. A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream

    Directory of Open Access Journals (Sweden)

    Amineh Amini

    2014-01-01

    Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.

  1. APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags

    DEFF Research Database (Denmark)

    Zong, Yu; Xu, Guandong; Jin, Pin

    2011-01-01

    algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone...... resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...

  2. COMBINING FUZZY AND CELLULAR LEARNING AUTOMATA METHODS FOR CLUSTERING WIRELESS SENSOR NETWORK TO INCREASE LIFE OF THE NETWORK

    Directory of Open Access Journals (Sweden)

    Javad Aramideh

    2014-11-01

    Full Text Available Wireless sensor networks have attracted attention of researchers considering their abundant applications. One of the important issues in this network is limitation of energy consumption which is directly related to life of the network. One of the main works which have been done recently to confront with this problem is clustering. In this paper, an attempt has been made to present clustering method which performs clustering in two stages. In the first stage, it specifies candidate nodes for being head cluster with fuzzy method and in the next stage, the node of the head cluster is determined among the candidate nodes with cellular learning automata. Advantage of the clustering method is that clustering has been done based on three main parameters of the number of neighbors, energy level of nodes and distance between each node and sink node which results in selection of the best nodes as a candidate head of cluster nodes. Connectivity of network is also evaluated in the second part of head cluster determination. Therefore, more energy will be stored by determining suitable head clusters and creating balanced clusters in the network and consequently, life of the network increases.

  3. Research and Implementation of Unsupervised Clustering-Based Intrusion Detection

    Institute of Scientific and Technical Information of China (English)

    LuoMin; ZhangHuan-guo; WangLi-na

    2003-01-01

    An unsupervised clustering-based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal duster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment coneludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.

  4. Estimating insolation based on PV-module currents in a cluster of stand-alone solar systems: Introduction of a new method

    Energy Technology Data Exchange (ETDEWEB)

    Nieuwenhout, Frans; Van der Borg, Nico [Energy Research Centre of the Netherlands, Petten (Netherlands); Van Sark, Wilfried; Turkenburg, Wim [Copernicus Institute for Sustainable Development and Innovation, Utrecht University (Netherlands). Department of Science, Technology and Society

    2006-09-15

    In order to evaluate the performance of solar home systems (SHS), data on local insolation is a prerequisite. We present the outline of a new method to estimate insolation if direct measurements are unavailable. This method comprises estimation of daily irradiation by correlating photovoltaic (PV)-module currents from a number of solar home systems, located a few kilometres apart. The objective is to obtain reliable daily and monthly insolation figures that are representative for an area of a few square kilometres. (author)

  5. Fast Affinity Propagation Clustering based on Machine Learning

    Directory of Open Access Journals (Sweden)

    Shailendra Kumar Shrivastava

    2013-01-01

    Full Text Available Affinity propagation (AP was recently introduced as an un-supervised learning algorithm for exemplar based clustering. In this paper a novel Fast Affinity Propagation clustering Approach based on Machine Learning (FAPML has been proposed. FAPML tries to put data points into clusters based on the history of the data points belonging to clusters in early stages. In FAPML we introduce affinity learning constant and dispersion constant which supervise the clustering process. FAPML also enforces the exemplar consistency and one of 'N constraints. Experiments conducted on many data sets such as Olivetti faces, Mushroom, Documents summarization, Thyroid, Yeast, Wine quality Red, Balance etc. show that FAPML is up to 54 % faster than the original AP with better Net Similarity.

  6. Intelligent Hybrid Cluster Based Classification Algorithm for Social Network Analysis

    Directory of Open Access Journals (Sweden)

    S. Muthurajkumar

    2014-05-01

    Full Text Available In this paper, we propose an hybrid clustering based classification algorithm based on mean approach to effectively classify to mine the ordered sequences (paths from weblog data in order to perform social network analysis. In the system proposed in this work for social pattern analysis, the sequences of human activities are typically analyzed by switching behaviors, which are likely to produce overlapping clusters. In this proposed system, a robust Modified Boosting algorithm is proposed to hybrid clustering based classification for clustering the data. This work is useful to provide connection between the aggregated features from the network data and traditional indices used in social network analysis. Experimental results show that the proposed algorithm improves the decision results from data clustering when combined with the proposed classification algorithm and hence it is proved that of provides better classification accuracy when tested with Weblog dataset. In addition, this algorithm improves the predictive performance especially for multiclass datasets which can increases the accuracy.

  7. Summarizing Relational Data Using Semi-Supervised Genetic Algorithm-Based Clustering Techniques

    Directory of Open Access Journals (Sweden)

    Rayner Alfred

    2010-01-01

    Full Text Available Problem statement: In solving a classification problem in relational data mining, traditional methods, for example, the C4.5 and its variants, usually require data transformations from datasets stored in multiple tables into a single table. Unfortunately, we may loss some information when we join tables with a high degree of one-to-many association. Therefore, data transformation becomes a tedious trial-and-error work and the classification result is often not very promising especially when the number of tables and the degree of one-to-many association are large. Approach: We proposed a genetic semi-supervised clustering technique as a means of aggregating data stored in multiple tables to facilitate the task of solving a classification problem in relational database. This algorithm is suitable for classification of datasets with a high degree of one-to-many associations. It can be used in two ways. One is user-controlled clustering, where the user may control the result of clustering by varying the compactness of the spherical cluster. The other is automatic clustering, where a non-overlap clustering strategy is applied. In this study, we use the latter method to dynamically cluster multiple instances, as a means of aggregating them and illustrate the effectiveness of this method using the semi-supervised genetic algorithm-based clustering technique. Results: It was shown in the experimental results that using the reciprocal of Davies-Bouldin Index for cluster dispersion and the reciprocal of Gini Index for cluster purity, as the fitness function in the Genetic Algorithm (GA, finds solutions with much greater accuracy. The results obtained in this study showed that automatic clustering (seeding, by optimizing the cluster dispersion or cluster purity alone using GA, provides one with good results compared to the traditional k-means clustering. However, the best result can be achieved by optimizing the combination values of both the cluster

  8. Result diversification based on query-specific cluster ranking

    NARCIS (Netherlands)

    He, J.; Meij, E.; de Rijke, M.

    2011-01-01

    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification

  9. Adapted G-mode Clustering Method applied to Asteroid Taxonomy

    Science.gov (United States)

    Hasselmann, Pedro H.; Carvano, Jorge M.; Lazzaro, D.

    2013-11-01

    The original G-mode was a clustering method developed by A. I. Gavrishin in the late 60's for geochemical classification of rocks, but was also applied to asteroid photometry, cosmic rays, lunar sample and planetary science spectroscopy data. In this work, we used an adapted version to classify the asteroid photometry from SDSS Moving Objects Catalog. The method works by identifying normal distributions in a multidimensional space of variables. The identification starts by locating a set of points with smallest mutual distance in the sample, which is a problem when data is not planar. Here we present a modified version of the G-mode algorithm, which was previously written in FORTRAN 77, in Python 2.7 and using NumPy, SciPy and Matplotlib packages. The NumPy was used for array and matrix manipulation and Matplotlib for plot control. The Scipy had a import role in speeding up G-mode, Scipy.spatial.distance.mahalanobis was chosen as distance estimator and Numpy.histogramdd was applied to find the initial seeds from which clusters are going to evolve. Scipy was also used to quickly produce dendrograms showing the distances among clusters. Finally, results for Asteroids Taxonomy and tests for different sample sizes and implementations are presented.

  10. A New Method to Quantify X-ray Substructures in Clusters of Galaxies

    CERN Document Server

    Andrade-Santos, Felipe; Laganá, Tatiana Ferraz

    2011-01-01

    We present a new method to quantify substructures in clusters of galaxies, based on the analysis of the intensity of structures. This analysis is done in a residual image that is the result of the subtraction of a surface brightness model, obtained by fitting a two-dimensional analytical model (beta-model or S\\'ersic profile) with elliptical symmetry, from the X-ray image. Our method is applied to 34 clusters observed by the Chandra Space Telescope that are in the redshift range 0.02method and the relations between the substructure level with physical quantities, such as the mass, X-ray luminosity, temperature, and cluster redshift. We use our method to separate the clusters in two sub-samples of high and low substructure levels. We conclude, using Monte Carlo simulations, that the method recuperates very well the true amount of substructure for small angular core radii clusters (with respect to the whole image s...

  11. Sonar Image Detection Algorithm Based on Two-Phase Manifold Partner Clustering

    Institute of Scientific and Technical Information of China (English)

    Xingmei Wang; Zhipeng Liu; Jianchuang Sun; Shu Liu

    2015-01-01

    According to the characteristics of sonar image data with manifold feature, the sonar image detection method based on two⁃phase manifold partner clustering algorithm is proposed. Firstly, K⁃means block clustering based on euclidean distance is proposed to reduce the data set. Mean value, standard deviation, and gray minimum value are considered as three features based on the relatinship between clustering model and data structure. Then K⁃means clustering algorithm based on manifold distance is utilized clustering again on the reduced data set to improve the detection efficiency. In K⁃means clustering algorithm based on manifold distance, line segment length on the manifold is analyzed, and a new power function line segment length is proposed to decrease the computational complexity. In order to quickly calculate the manifold distance, new all⁃source shortest path as the pretreatment of efficient algorithm is proposed. Based on this, the spatial feature of the image block is added in the three features to get the final precise partner clustering algorithm. The comparison with the other typical clustering algorithms demonstrates that the proposed algorithm gets good detection result. And it has better adaptability by experiments of the different real sonar images.

  12. Perturbative universal state-selective correction for state-specific multi-reference coupled cluster methods

    Energy Technology Data Exchange (ETDEWEB)

    Brabec, Jiri; Banik, Subrata; Kowalski, Karol; Pittner, Jiří

    2016-10-28

    The implementation details of the universal state-selective (USS) multi-reference coupled cluster (MRCC) formalism with singles and doubles (USS(2)) are discussed on the example of several benchmark systems. We demonstrate that the USS(2) formalism is capable of improving accuracies of state specific multi-reference coupled-cluster (MRCC) methods based on the Brillouin-Wigner and Mukherjee’s sufficiency conditions. Additionally, it is shown that the USS(2) approach significantly alleviates problems associated with the lack of invariance of MRCC theories upon the rotation of active orbitals. We also discuss the perturbative USS(2) formulations that significantly reduce numerical overhead of the full USS(2) method.

  13. Cluster-based global firms' use of local capabilities

    DEFF Research Database (Denmark)

    Andersen, Poul Houman; Bøllingtoft, Anne

    2011-01-01

    knowledge base as a mediating variable, the purpose of this paper is to examine how globalization affected the studied firms’ use of local cluster-based knowledge, integration of local and global knowledge, and networking capabilities. Design/methodology/approach – Qualitative case studies of nine firms...... knowledge were highly active in local knowledge use, whereas CBFs characterized by a more implicit knowledge base did not use localized knowledge. Research limitations/implications – The study is exploratory and covers three clusters in one small and open developed economy. Further corroboration through...... takes a micro-oriented perspective and focus on clusters in Denmark, a small and mature economy...

  14. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Hiroshi Ishii

    2012-05-01

    Full Text Available The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided.

  15. A Survey on the Taxonomy of Cluster-Based Routing Protocols for Homogeneous Wireless Sensor Networks

    Science.gov (United States)

    Naeimi, Soroush; Ghafghazi, Hamidreza; Chow, Chee-Onn; Ishii, Hiroshi

    2012-01-01

    The past few years have witnessed increased interest among researchers in cluster-based protocols for homogeneous networks because of their better scalability and higher energy efficiency than other routing protocols. Given the limited capabilities of sensor nodes in terms of energy resources, processing and communication range, the cluster-based protocols should be compatible with these constraints in either the setup state or steady data transmission state. With focus on these constraints, we classify routing protocols according to their objectives and methods towards addressing the shortcomings of clustering process on each stage of cluster head selection, cluster formation, data aggregation and data communication. We summarize the techniques and methods used in these categories, while the weakness and strength of each protocol is pointed out in details. Furthermore, taxonomy of the protocols in each phase is given to provide a deeper understanding of current clustering approaches. Ultimately based on the existing research, a summary of the issues and solutions of the attributes and characteristics of clustering approaches and some open research areas in cluster-based routing protocols that can be further pursued are provided. PMID:22969350

  16. Perceptual Object Extraction Based on Saliency and Clustering

    Directory of Open Access Journals (Sweden)

    Qiaorong Zhang

    2010-08-01

    Full Text Available Object-based visual attention has received an increasing interest in recent years. Perceptual object is the basic attention unit of object-based visual attention. The definition and extraction of perceptual objects is one of the key technologies in object-based visual attention computation model. A novel perceptual object definition and extraction method is proposed in this paper. Based on Gestalt theory and visual feature integration theory, perceptual object is defined using homogeneity region, salient region and edges. An improved saliency map generating algorithm is employed first. Based on the saliency map, salient edges are extracted. Then graph-based clustering algorithm is introduced to get homogeneity regions in the image. Finally an integration strategy is adopted to combine salient edges and homogeneity regions to extract perceptual objects. The proposed perceptual object extraction method has been tested on lots of natural images. Experiment results and analysis are presented in this paper also. Experiment results show that the proposed method is reasonable and valid.

  17. Cluster randomized trial of an active, multifaceted information dissemination intervention based on The WHO Reproductive health library to change obstetric practices: methods and design issues [ISRCTN14055385

    OpenAIRE

    Lumbiganon Pisake; Grimshaw Jeremy; Piaggio Gilda; Villar José; Gülmezoglu A; Langer Ana

    2004-01-01

    Abstract Background Effective strategies for implementing best practices in low and middle income countries are needed. RHL is an annually updated electronic publication containing Cochrane systematic reviews, commentaries and practical recommendations on how to implement evidence-based practices. We are conducting a trial to evaluate the improvement in obstetric practices using an active dissemination strategy to promote uptake of recommendations in The WHO Reproductive Health Library (RHL)....

  18. An Ontology-based Knowledge Management System for Industry Clusters

    CERN Document Server

    Sureephong, Pradorn; Ouzrout, Yacine; Bouras, Abdelaziz

    2008-01-01

    Knowledge-based economy forces companies in the nation to group together as a cluster in order to maintain their competitiveness in the world market. The cluster development relies on two key success factors which are knowledge sharing and collaboration between the actors in the cluster. Thus, our study tries to propose knowledge management system to support knowledge management activities within the cluster. To achieve the objectives of this study, ontology takes a very important role in knowledge management process in various ways; such as building reusable and faster knowledge-bases, better way for representing the knowledge explicitly. However, creating and representing ontology create difficulties to organization due to the ambiguity and unstructured of source of knowledge. Therefore, the objectives of this paper are to propose the methodology to create and represent ontology for the organization development by using knowledge engineering approach. The handicraft cluster in Thailand is used as a case stu...

  19. CFSBC:Clustering in High-Dimensional Space Based on Closed Frequent Item Set

    Institute of Scientific and Technical Information of China (English)

    NI Wei-wei; SUN Zhi-hui

    2004-01-01

    Clustering in high-dimensional space is an important domain in data mining.It is the process of discovering groups in a high-dimensional dataset, in such way, that the similarity between the elements of the same cluster is maximum and between different clusters is minimal.Many clustering algorithms are not applicable to high-dimensional space for its sparseness and decline properties.Dimensionality reduction is an effective method to solve this problem.The paper proposes a novel clustering algorithm CFSBC based on closed frequent itemsets derived from association rule mining, which can get the clustering attributes with high efficiency.The algorithm has several advantages.First, it deals effectively with the problem of dimensionality reduction.Second, it is applicable to different kinds of attributes.Third, it is suitable for very large data sets.Experiment shows that the proposed algorithm is effective and efficient.

  20. Neural network based cluster creation in the ATLAS silicon Pixel Detector

    CERN Document Server

    Perez Cavalcanti, T; The ATLAS collaboration

    2012-01-01

    The hit signals read out from pixels on planar semi-conductor sensors are grouped into clusters, to reconstruct the location where a charged particle passed through. The resolution of the individual pixel sizes can be improved significantly using the information from the cluster of adjacent pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years giving an excellent performance. However, in dense environments, such as those inside high-energy jets, is likely that the charge deposited by two or more close-by tracks merges into one single cluster. A new pattern recognition algorithm based on neural network methods has been developed for the ATLAS Pixel Detector. This can identify the shared clusters, split them if necessary, and estimate the positions of all particles traversing the cluster. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurements to tracks within jets, and improves the positional accuracy with respect to stand...

  1. Segmentation of color images based on the gravitational clustering concept

    Science.gov (United States)

    Lai, Andrew H.; Yung, H. C.

    1998-03-01

    A new clustering algorithm derived from the Markovian model of the gravitational clustering concept is proposed that works in the RGB measurement space for color image. To enable the model to be applicable in image segmentation, the new algorithm imposes a clustering constraint at each clustering iteration to control and determine the formation of multiple clusters. Using such constraint to limit the attraction between clusters, a termination condition can be easily defined. The new clustering algorithm is evaluated objectively and subjectively on three different images against the K-means clustering algorithm, the recursive histogram clustering algorithm for color, the Hedley-Yan algorithm, and the widely used seed-based region growing algorithm. From the evaluation, it is observed that the new algorithm exhibits the following characteristics: (1) its objective measurement figures are comparable with the best in this group of segmentation algorithms; (2) it generates smoother region boundaries; (3) the segmented boundaries align closely with the original boundaries; and (4) it forms a meaningful number of segmented regions.

  2. Multiple Imputation based Clustering Validation (MIV) for Big Longitudinal Trial Data with Missing Values in eHealth.

    Science.gov (United States)

    Zhang, Zhaoyang; Fang, Hua; Wang, Honggang

    2016-06-01

    Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services.

  3. Improving Tensor Based Recommenders with Clustering

    DEFF Research Database (Denmark)

    Leginus, Martin; Dolog, Peter; Zemaitis, Valdas

    2012-01-01

    Social tagging systems (STS) model three types of entities (i.e. tag-user-item) and relationships between them are encoded into a 3-order tensor. Latent relationships and patterns can be discovered by applying tensor factorization techniques like Higher Order Singular Value Decomposition (HOSVD......), Canonical Decomposition etc. STS accumulate large amount of sparse data that restricts factorization techniques to detect latent relations and also significantly slows down the process of a factorization. We propose to reduce tag space by exploiting clustering techniques so that the quality...... of the recommendations and execution time are improved and memory requirements are decreased. The clustering is motivated by the fact that many tags in a tag space are semantically similar thus the tags can be grouped. Finally, promising experimental results are presented...

  4. Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head

    KAUST Repository

    Ben Ghorbel, Mahdi

    2012-04-01

    Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.

  5. Optimization of Stacking Method Based on Cluster Analysis and Decision Tree%基于聚类分析和决策树的堆垛方法优化

    Institute of Scientific and Technical Information of China (English)

    高昊江; 张宜生; 肖田元

    2011-01-01

    准时生产模式下的大型钢铁卷材仓储方法是一个多目标综合优化问题,依靠人工经验的传统方法已不能满足生产需要.由此提出根据生产计划和安全在库系数计算货品在库量的方法,设计聚类分析算法用于货架配置,构造决策树方法解决多目标综合优化问题.实验结果证明,该方法能够提高出库效率和仓储空间利用率,满足安全生产、优质高效、减少浪费的要求.%In Just in Time(JIT) production model, large coils storage study is to resolve a multi-objective optimization problem, and traditional manual methods do not work. A storehouse goods amount calculation method is designed based on production plan and safe stock coefficient. And a cluster analysis algorithm is given for allocation of shelves. A decision tree is structured to solve the multi-objective optimization problem. These methods pass the test of practice, raise the warehouse efficiency and the utilization rate of storehouse, and meet the enterprise requirements, such as safety production, good quality and high efficiency and reduced waste.

  6. Fuzzy clustering-based segmented attenuation correction in whole-body PET

    CERN Document Server

    Zaidi, H; Boudraa, A; Slosman, DO

    2001-01-01

    Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...

  7. Clustering of attitudes towards obesity: a mixed methods study of Australian parents and children

    Science.gov (United States)

    2013-01-01

    Background Current population-based anti-obesity campaigns often target individuals based on either weight or socio-demographic characteristics, and give a ‘mass’ message about personal responsibility. There is a recognition that attempts to influence attitudes and opinions may be more effective if they resonate with the beliefs that different groups have about the causes of, and solutions for, obesity. Limited research has explored how attitudinal factors may inform the development of both upstream and downstream social marketing initiatives. Methods Computer-assisted face-to-face interviews were conducted with 159 parents and 184 of their children (aged 9–18 years old) in two Australian states. A mixed methods approach was used to assess attitudes towards obesity, and elucidate why different groups held various attitudes towards obesity. Participants were quantitatively assessed on eight dimensions relating to the severity and extent, causes and responsibility, possible remedies, and messaging strategies. Cluster analysis was used to determine attitudinal clusters. Participants were also able to qualify each answer. Qualitative responses were analysed both within and across attitudinal clusters using a constant comparative method. Results Three clusters were identified. Concerned Internalisers (27% of the sample) judged that obesity was a serious health problem, that Australia had among the highest levels of obesity in the world and that prevalence was rapidly increasing. They situated the causes and remedies for the obesity crisis in individual choices. Concerned Externalisers (38% of the sample) held similar views about the severity and extent of the obesity crisis. However, they saw responsibility and remedies as a societal rather than an individual issue. The final cluster, the Moderates, which contained significantly more children and males, believed that obesity was not such an important public health issue, and judged the extent of obesity to be

  8. Likelihood-based inference for clustered line transect data

    DEFF Research Database (Denmark)

    Waagepetersen, Rasmus Plenge; Schweder, Tore

    is implemented using Markov Chain Monte Carlo methods to obtain efficient estimates of spatial clustering parameters. Uncertainty is addressed using parametric bootstrap or by consideration of posterior distributions in a Bayesian setting. Maximum likelihood estimation and Bayesian inference is compared...

  9. Efficient Cluster Head Selection Methods for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Jong-Shin Chen

    2010-08-01

    Full Text Available The past few years have witnessed increased in the potential use of wireless sensor network (WSN such as disaster management, combat field reconnaissance, border protection and security surveillance. Sensors in these applications are expected to be remotely deployed in large numbers and to operate autonomously in unattended environments. Since a WSN is composed of nodes with nonreplenishable energy resource, elongating the network lifetime is the main concern. To support scalability, nodes are often grouped into disjoint clusters. Each cluster would have a leader, often referred as cluster head (CH. A CH is responsible for not only the general request but also assisting the general nodes to route the sensed data to the target nodes. The power-consumption of a CH is higher then of a general (non-CH node. Therefore, the CH selection will affect the lifetime of a WSN. However, the application scenario contexts of WSNs that determine the definitions of lifetime will impact to achieve the objective of elongating lifetime. In this study, we classify the lifetime into different types and give the corresponding CH selection method to achieve the life-time extension objective. Simulation results demonstrate our study can enlarge the life-time for different requests of the sensor networks.

  10. Microphone Clustering and BP Network based Acoustic Source Localization in Distributed Microphone Arrays

    Directory of Open Access Journals (Sweden)

    CHEN, Z.

    2013-11-01

    Full Text Available A microphone clustering and back propagation (BP neural network based acoustic source localization method using distributed microphone arrays in an intelligent meeting room is proposed. In the proposed method, a novel clustering algorithm is first used to divide all microphones into several clusters where each one corresponds to a specified BP network. Afterwards, the energy-based cluster selecting scheme is applied to select clusters which are small and close to the source. In each chosen cluster, the time difference of arrival of each microphone pair is estimated, and then all estimated time delays act as input of the corresponding BP network for position estimation. Finally, all estimated positions from the chosen clusters are fused for global position estimation. Only subsets rather than all the microphones are responsible for acoustic source localization, which leads to less computational cost; moreover, the local estimation in each selected cluster can be processed in parallel, which expects to improve the localization speed potentially. Simulation results from comparison with other related localization approaches confirm the validity of the proposed method.

  11. Improving Energy Efficient Clustering Method for Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Md. Imran Hossain

    2013-08-01

    Full Text Available Wireless sensor networks have recently emerged as important computing platform. These sensors are power-limited and have limited computing resources. Therefore the sensor energy has to be managed wisely in order to maximize the lifetime of the network. Simply speaking, LEACH requires the knowledge of energy for every node in the network topology used. In LEACHs threshold which selects the cluster head is fixed so this protocol does not consider network topology environments. We proposed IELP algorithm, which selects cluster heads using different thresholds. New cluster head selection probability consists of the initial energy and the number of neighbor nodes. On rotation basis, a head-set member receives data from the neighboring nodes and transmits the aggregated results to the distant base station. For a given number of data collecting sensor nodes, the number of control and management nodes can be systematically adjusted to reduce the energy consumption, which increases the network life.The simulation results show that the performance of IELP has an improvement of 39% over LEACH and 20% over SEP in the area of 100m*100m for m=0.1, α =2 where advanced nodes (m and the additional energy factor between advanced and normal nodes (α.

  12. The Integral- and Intermediate-Screened Coupled-Cluster Method

    CERN Document Server

    Sørensen, L K

    2016-01-01

    We present the formulation and implementation of the integral- and intermediate-screened coupled-cluster method (ISSCC). The IISCC method gives a simple and rigorous integral and intermediate screening (IIS) of the coupled-cluster method and will significantly reduces the scaling for all orders of the CC hierarchy exactly like seen for the integral-screened configuration-interaction method (ISCI). The rigorous IIS in the IISCC gives a robust and adjustable error control which should allow for the possibility of converging the energy without any loss of accuracy while retaining low or linear scaling at the same time. The derivation of the IISCC is performed in a similar fashion as in the ISCI where we show that the tensor contractions for the nested commutators are separable up to an overall sign and that this separability can lead to a rigorous IIS. In the nested commutators the integrals are screened in the first tensor contraction and the intermediates are screened in all successive tensor contractions. The...

  13. Comparing Methods for segmentation of Microcalcification Clusters in Digitized Mammograms

    CERN Document Server

    Moradmand, Hajar; Targhi, Hossein Khazaei

    2012-01-01

    The appearance of microcalcifications in mammograms is one of the early signs of breast cancer. So, early detection of microcalcification clusters (MCCs) in mammograms can be helpful for cancer diagnosis and better treatment of breast cancer. In this paper a computer method has been proposed to support radiologists in detection MCCs in digital mammography. First, in order to facilitate and improve the detection step, mammogram images have been enhanced with wavelet transformation and morphology operation. Then for segmentation of suspicious MCCs, two methods have been investigated. The considered methods are: adaptive threshold and watershed segmentation. Finally, the detected MCCs areas in different algorithms will be compared to find out which segmentation method is more appropriate for extracting MCCs in mammograms.

  14. An Efficient Initialization Method for K-Means Clustering of Hyperspectral Data

    Science.gov (United States)

    Alizade Naeini, A.; Jamshidzadeh, A.; Saadatseresht, M.; Homayouni, S.

    2014-10-01

    K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman-Pearson detection theory based eigen-thresholding method (i.e., the HFC method) has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES) algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF) and Random methods) are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods' performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  15. Clustering and rule-based classifications of chemical structures evaluated in the biological activity space.

    Science.gov (United States)

    Schuffenhauer, Ansgar; Brown, Nathan; Ertl, Peter; Jenkins, Jeremy L; Selzer, Paul; Hamon, Jacques

    2007-01-01

    Classification methods for data sets of molecules according to their chemical structure were evaluated for their biological relevance, including rule-based, scaffold-oriented classification methods and clustering based on molecular descriptors. Three data sets resulting from uniformly determined in vitro biological profiling experiments were classified according to their chemical structures, and the results were compared in a Pareto analysis with the number of classes and their average spread in the profile space as two concurrent objectives which were to be minimized. It has been found that no classification method is overall superior to all other studied methods, but there is a general trend that rule-based, scaffold-oriented methods are the better choice if classes with homogeneous biological activity are required, but a large number of clusters can be tolerated. On the other hand, clustering based on chemical fingerprints is superior if fewer and larger classes are required, and some loss of homogeneity in biological activity can be accepted.

  16. 基于灰色聚类方法的慢性心力衰竭中医证型文献分析%Grey Clustering Method Based on Traditional Chinese Medicine Literature of Chronic Heart Failure Analysis

    Institute of Scientific and Technical Information of China (English)

    刘宾; 王付; 黄明宜

    2011-01-01

    Objective:CHF syndrome differentiation of the literature was collated and analyzed to explore the CHF syndrome differentiation type of objective laws.Method: Over the past 10 years, CHF Syndromes of literature,theoretical calculations using gray card type data series on gray and clustering.Result: 13 permits will be based,clustering is: Heart Qi and Yin deficiency, yang deficiency of water pan, yin yang dried off, blood stasis, phlegm.Conclusion: CHF traditional Chinese medicine(TCM) Syndrome Differentiation of the objective law, for the further development of CHF TCM diagnostic criteria and evaluation criteria provides efficacy and references.%目的:对慢性心力衰竭(CHF)辨证分型的文献报道进行了整理和分析,探讨CHF中医辨证分型的客观规律.方法:收集近10年CHF辨证分型的文献报道,采用灰色系统理论计算证型数据序列的灰色绝对关联度并进行聚类.结果:将得到的13个证型,聚类为心气阴两虚,阳虚水泛,阴竭阳脱,血瘀,痰阻.结论:探讨CHF中医辨证分型的客观规律,为进一步制定CHF中医证候诊断标准和疗效评价标准提供了参考和依据.

  17. Clustering economies based on multiple criteria decision making techniques

    Directory of Open Access Journals (Sweden)

    Mansour Momeni

    2011-10-01

    Full Text Available One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group includes countries with high standards such as Germany and Japan. In the second cluster, there are some developing countries with relatively good economic growth such as Saudi Arabia and Iran. The third cluster belongs to countries with faster rates of growth compared with the countries located in the second group such as China, India and Mexico. Finally, the fourth cluster includes countries with relatively very low rates of growth such as Jordan, Mali, Niger, etc.

  18. Cluster-based distributed face tracking in camera networks.

    Science.gov (United States)

    Yoder, Josiah; Medeiros, Henry; Park, Johnny; Kak, Avinash C

    2010-10-01

    In this paper, we present a distributed multicamera face tracking system suitable for large wired camera networks. Unlike previous multicamera face tracking systems, our system does not require a central server to coordinate the entire tracking effort. Instead, an efficient camera clustering protocol is used to dynamically form groups of cameras for in-network tracking of individual faces. The clustering protocol includes cluster propagation mechanisms that allow the computational load of face tracking to be transferred to different cameras as the target objects move. Furthermore, the dynamic election of cluster leaders provides robustness against system failures. Our experimental results show that our cluster-based distributed face tracker is capable of accurately tracking multiple faces in real-time. The overall performance of the distributed system is comparable to that of a centralized face tracker, while presenting the advantages of scalability and robustness.

  19. Neural network based cluster creation in the ATLAS Pixel Detector

    CERN Document Server

    Andreazza, A; The ATLAS collaboration

    2012-01-01

    The read-out from individual pixels on planar semi-conductor sensors are grouped into clusters to reconstruct the location where a charged particle passed through the sensor. The resolution given by individual pixel sizes is significantly improved by using the information from the charge sharing be- tween pixels. Such analog cluster creation techniques have been used by the ATLAS experiment for many years to obtain an excellent performance. How- ever, in dense environments, such as those inside high-energy jets, clusters have an increased probability of merging the charge deposited by multiple particles. Recently, a neural network based algorithm which estimates both the cluster position and whether a cluster should be split has been developed for the ATLAS Pixel Detector. The algorithm significantly reduces ambigui- ties in the assignment of pixel detector measurement to tracks and improves the position accuracy with respect to standard techniques by taking into account the 2-dimensional charge distribution.

  20. Relation Based Mining Model for Enhancing Web Document Clustering

    Directory of Open Access Journals (Sweden)

    M.Reka

    2014-05-01

    Full Text Available The design of web Information management system becomes more complex one with more time complexity. Information retrieval is a difficult task due to the huge volume of web documents. The way of clustering makes the retrieval easier and less time consuming. Thisalgorithm introducesa web document clustering approach, which use the semantic relation between documents, which reduces the time complexity. It identifies the relations and concepts in a document and also computes the relation score between documents. This algorithm analyses the key concepts from the web documents by preprocessing, stemming, and stop word removal. Identified concepts are used to compute the document relation score and clusterrelation score. The domain ontology is used to compute the document relation score and cluster relation score. Based on the document relation score and cluster relation score, the web document cluster is identified. This algorithm uses 2,00,000 web documents for evaluation and 60 percentas trainingset and 40 percent as testing set.

  1. Semi-supervised segmentation of multispectral remote sensing image based on spectral clustering

    Science.gov (United States)

    Zhang, Xiangrong; Wang, Ting; Jiao, Licheng; Yang, Chun

    2009-10-01

    In this paper, a new multi-spectral remote sensing image segmentation method based on multi-parameter semi-supervised spectral clustering (STS3C) is proposed. Two types of instance-level constraints: must-link and cannot-link are incorporated into spectral cluster to construct semi-supervised spectral clustering in which the self-tuning parameter is applied to avoid the selection of the scaling parameter. Further, when STS3C is applied to multi-spectral remote sensing image segmentation, the uniform sampling technique combined with nearest neighbor rule is used to reduce the computation complexity. Segmentation results show that STS3C outperforms the semi-supervised spectral clustering with fixed parameter and the well-known clustering methods including k-means and FCM in multi-spectral remote sensing image segmentation.

  2. A Cluster-based Approach Towards Detecting and Modeling Network Dictionary Attacks

    Directory of Open Access Journals (Sweden)

    A. Tajari Siahmarzkooh

    2016-12-01

    Full Text Available In this paper, we provide an approach to detect network dictionary attacks using a data set collected as flows based on which a clustered graph is resulted. These flows provide an aggregated view of the network traffic in which the exchanged packets in the network are considered so that more internally connected nodes would be clustered. We show that dictionary attacks could be detected through some parameters namely the number and the weight of clusters in time series and their evolution over the time. Additionally, the Markov model based on the average weight of clusters,will be also created. Finally, by means of our suggested model, we demonstrate that artificial clusters of the flows are created for normal and malicious traffic. The results of the proposed approach on CAIDA 2007 data set suggest a high accuracy for the model and, therefore, it provides a proper method for detecting the dictionary attack.

  3. Energy Band Based Clustering Protocol for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Prabhat Kumar

    2012-07-01

    Full Text Available Clustering is one of the widely used techniques to prolong the lifetime of wireless sensor networks in environments where battery replacement of individual sensor nodes is not an option after their deployment. However, clustering overheads such as cluster formation, its size, cluster head selection rotation, directly affects the lifetime of WSN. This paper introduces and analyzes a new Single Hop Energy Band Based clustering protocol (EBBCP which tries to minimize the above said overheads resulting in a prolonged life for the WSN. EBBCP works on static clusters formed on the basis of energy band in the setup phase. The protocol reduces per round overhead of cluster formation which has been proved by the simulation result in MATLAB. The paper contains an in-depth analysis of the results obtained during simulation and compares EBBCP with LEACH. Unlike LEACH, EBBCP achieves evenly distributed Cluster Head throughout the target area. This protocol also produces evenly distributed dead nodes. EEBCP beats LEACH in total data packet received and produces better network life time. EBBCP uses the concept of grid node to eliminate the need of position finding system like GPS to estimating the transmission signal strength.

  4. 非参数认知诊断方法:多级评分的聚类分析%Nonparametric Cognitive Diagnosis:A Cluster Diagnostic Method Based on Grade Response Items

    Institute of Scientific and Technical Information of China (English)

    康春花; 任平; 曾平飞

    2015-01-01

    Examinations help students learn more efficiently by filling their learning gaps. To achieve this goal, we have to differentiate students who have from those who have not mastered a set of attributes as measured by the test through cognitive diagnostic assessment. K-means cluster analysis, being a nonparametric cognitive diagnosis method requires the Q-matrix only, which reflects the relationship between attributes and items. This does not require the estimation of the parameters, so is independent of sample size, simple to operate, and easy to understand. Previous research use the sum score vectors or capability scores vector as the clustering objects. These methods are only adaptive for dichotomous data. Structural response items are, however, the main type used in examinations, particularly as required in recent reforms. On the basis of previous research, this paper puts forward a method to calculate a capability matrix reflecting the mastery level on skills and is applicable to grade response items. Our study included four parts. First, we introduced the K-means cluster diagnosis method which has been adapted for dichotomous data. Second, we expanded the K-means cluster diagnosis method for grade response data (GRCDM). Third, in Part Two, we investigated the performance of the method introduced using a simulation study. Fourth, we investigated the performance of the method in an empirical study. The simulation study focused on three factors. First, the sample size was set to be 100, 500, and 1000. Second, the percentage of random errors was manipulated to be 5%, 10%, and 20%. Third, it had four hierarchies, as proposed by Leighton. All experimental conditions composed of seven attributes, different items according to hierarchies. Simulation results showed that: (1) GRCDM had a high pattern match ratio (PMR) and high marginal match ratio (MMR). This method was shown to be feasible in cognitive diagnostic assessment. (2) The classification accuracy (MMR and PMR

  5. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA

    Directory of Open Access Journals (Sweden)

    A. Alizade Naeini

    2014-10-01

    Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.

  6. Genetic Diversity among Parents of Hybrid Rice Based on Cluster Analysis of Morphological Traits and Simple Sequence Repeat Markers

    Institute of Scientific and Technical Information of China (English)

    WANG Sheng-jun; LU Zuo-mei; WAN Jian-min

    2006-01-01

    The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.

  7. Evidence-based treatments for cluster headache

    Directory of Open Access Journals (Sweden)

    Gooriah R

    2015-11-01

    Full Text Available Rubesh Gooriah, Alina Buture, Fayyaz Ahmed Department of Neurology, Hull Royal Infirmary, Kingston upon Hull, UK Abstract: Cluster headache (CH, one of the most painful syndromes known to man, is managed with acute and preventive medications. The brief duration and severity of the attacks command the use of rapid-acting pain relievers. Inhalation of oxygen and subcutaneous sumatriptan are the two most effective acute therapeutic options for sufferers of CH. Several preventive medications are available, the most effective of which is verapamil. However, most of these agents are not backed by strong clinical evidence. In some patients, these options can be ineffective, especially in those who develop chronic CH. Surgical procedures for the chronic refractory form of the disorder should then be contemplated, the most promising of which is hypothalamic deep brain stimulation. We hereby review the pathogenesis of CH and the evidence behind the treatment options for this debilitating condition. Keywords: cluster headache, pathogenesis, vasoactive intestinal peptide, suprachiasmatic nucleus

  8. A Study of Video Scenes Clustering Based on Shot Key Frames

    Institute of Scientific and Technical Information of China (English)

    CAI Bo; ZHANG Lu; ZHOU Dong-ru

    2005-01-01

    In digital video analysis, browse, retrieval and query, shot is incapable of meeting needs. Scene is a cluster of a series of shots, which partially meets above demands. In this paper, an algorithm of video scenes clustering based on shot key frame sets is proposed. We use X2 histogram match and twin histogram comparison for shot detection. A method is presented for key frame set extraction based on distance of non adjacent frames, further more, the minimum distance of key frame sets as distance of shots is computed, eventually scenes are clustered according to the distance of shots. Experiments of this algorithm show satisfactory performance in correctness and computing speed.

  9. Clustering in Very Large Databases Based on Distance and Density

    Institute of Scientific and Technical Information of China (English)

    QIAN WeiNing(钱卫宁); GONG XueQing(宫学庆); ZHOU AoYing(周傲英)

    2003-01-01

    Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g., R*-tree); 2) pre-specifying the uncertain parameter k, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies,can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore,this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.

  10. Cluster Development of Zhengzhou Urban Agriculture Based on Diamond Model

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    Based on basic theory of Diamond Model,this paper analyzes the competitive power of Zhengzhou urban agriculture from production factors,demand conditions,related and supporting industries,business strategies and structure,and horizontal competition.In line with these situations,it introduces that the cluster development is an effective approach to lifting competitive power of Zhengzhou urban agriculture.Finally,it presents following countermeasures and suggestions:optimize spatial distribution for cluster development of urban agriculture;cultivate leading enterprises and optimize organizational form of urban agriculture;energetically develop low-carbon agriculture to create favorable ecological environment for cluster development of urban agriculture.

  11. MSClust: A Multi-Seeds Based Clustering Algorithm for microbiome profiling using 16S rRNA Sequence

    Science.gov (United States)

    Chen, Wei; Cheng, Yongmei; Zhang, Clarence; Zhang, Shaowu; Zhao, Hongyu

    2013-01-01

    Recent developments of next generation sequencing technologies have led to rapid accumulation of 16s rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability. PMID:23899776

  12. A fast quad-tree based two dimensional hierarchical clustering.

    Science.gov (United States)

    Rajadurai, Priscilla; Sankaranarayanan, Swamynathan

    2012-01-01

    Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.

  13. 基于熵聚类与apriori算法的脾虚型泄泻用药规律研究%Composition Principles of Prescriptions for Diarrhoea Caused by Spleen Deficiency Based on Entropy Clustering and Apriori Method

    Institute of Scientific and Technical Information of China (English)

    田茸; 段永强; 马雪娇; 张琦; 李斌; 李杰; 陈雅雯; 杨晓轶; 杜娟

    2014-01-01

    目的:探讨治疗脾虚型泄泻方剂的组方用药规律。方法应用中医传承辅助系统(V1.3),收集、整理《中医方剂大辞典》中治疗脾虚型泄泻的方剂,构建数据库,利用熵聚类与apriori算法进行组方规律分析。结果对所筛选出的1185首方剂、815味中药进行分析,使用频次≥50次的药物有33味、常用药对50个,使用频次≥40次的药物组合29个,对支持度≥20%、置信度≥0.9的方剂进行关联规则分析,得到用于新方聚类的核心药物组合19对及19个候选新处方;以四君子汤的药物组成进行方剂匹配,相似度阈值为0.5,得出17首匹配处方。结论治疗脾虚型泄泻组方多以健脾益气为主,温阳渗湿止泻为辅。%Objective To analyze composition principles of prescriptions for the treatment of diarrhoea caused by deficiency of spleen. Methods The prescriptions for diarrhoea caused by deficiency of spleen in the Prescriptions of Traditional Chinese Medicine Dictionary were collected, sorted and entered into the TCM Inheritance Support System (V1.3) to analyze the composition principles through the methods of entropy clustering and apriori. Results Based on the analysis of 1185 prescriptions and 815 medications, there were 33 medications with more than 50 frequencies. Composition principles were obtained through apriori method:50 herbal pairs were used with more than 50 frequencies, and 29 core combinations with more than 40 frequencies. Association principle rule were used for the analysis of those medications in the prescriptions (support≥20%, confidence≥0.9). Principles were obtained through entropy clustering:there were 19 core combinations which composed the new prescriptions, and 19 new prescriptions were found through hierarchical clustering method. There were 17 prescriptions matching with Sijunzi decoction (semblance=0.5). Conclusion Diarrhoea caused by deficiency of spleen should be treated with strengthening spleen and

  14. Expanding Comparative Literature into Comparative Sciences Clusters with Neutrosophy and Quad-stage Method

    Directory of Open Access Journals (Sweden)

    Fu Yuhua

    2016-08-01

    Full Text Available By using Neutrosophy and Quad-stage Method, the expansions of comparative literature include: comparative social sciences clusters, comparative natural sciences clusters, comparative interdisciplinary sciences clusters, and so on. Among them, comparative social sciences clusters include: comparative literature, comparative history, comparative philosophy, and so on; comparative natural sciences clusters include: comparative mathematics, comparative physics, comparative chemistry, comparative medicine, comparative biology, and so on.

  15. Evaluation of sliding baseline methods for spatial estimation for cluster detection in the biosurveillance system

    Directory of Open Access Journals (Sweden)

    Leuze Michael

    2009-07-01

    Full Text Available Abstract Background The Centers for Disease Control and Prevention's (CDC's BioSense system provides near-real time situational awareness for public health monitoring through analysis of electronic health data. Determination of anomalous spatial and temporal disease clusters is a crucial part of the daily disease monitoring task. Our study focused on finding useful anomalies at manageable alert rates according to available BioSense data history. Methods The study dataset included more than 3 years of daily counts of military outpatient clinic visits for respiratory and rash syndrome groupings. We applied four spatial estimation methods in implementations of space-time scan statistics cross-checked in Matlab and C. We compared the utility of these methods according to the resultant background cluster rate (a false alarm surrogate and sensitivity to injected cluster signals. The comparison runs used a spatial resolution based on the facility zip code in the patient record and a finer resolution based on the residence zip code. Results Simple estimation methods that account for day-of-week (DOW data patterns yielded a clear advantage both in background cluster rate and in signal sensitivity. A 28-day baseline gave the most robust results for this estimation; the preferred baseline is long enough to remove daily fluctuations but short enough to reflect recent disease trends and data representation. Background cluster rates were lower for the rash syndrome counts than for the respiratory counts, likely because of seasonality and the large scale of the respiratory counts. Conclusion The spatial estimation method should be chosen according to characteristics of the selected data streams. In this dataset with strong day-of-week effects, the overall best detection performance was achieved using subregion averages over a 28-day baseline stratified by weekday or weekend/holiday behavior. Changing the estimation method for particular scenarios involving

  16. WORMHOLE ATTACK MITIGATION IN MANET: A CLUSTER BASED AVOIDANCE TECHNIQUE

    Directory of Open Access Journals (Sweden)

    Subhashis Banerjee

    2014-01-01

    Full Text Available A Mobile Ad-Hoc Network (MANET is a self configuring, infrastructure less network of mobile devices connected by wireless links. Loopholes like wireless medium, lack of a fixed infrastructure, dynamic topology, rapid deployment practices, and the hostile environments in which they may be deployed, make MANET vulnerable to a wide range of security attacks and Wormhole attack is one of them. During this attack a malicious node captures packets from one location in the network, and tunnels them to another colluding malicious node at a distant point, which replays them locally. This paper presents a cluster based Wormhole attack avoidance technique. The concept of hierarchical clustering with a novel hierarchical 32- bit node addressing scheme is used for avoiding the attacking path during the route discovery phase of the DSR protocol, which is considered as the under lying routing protocol. Pinpointing the location of the wormhole nodes in the case of exposed attack is also given by using this method.

  17. Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells

    Energy Technology Data Exchange (ETDEWEB)

    Felfer, P., E-mail: peter.felfer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ceguerra, A.V., E-mail: anna.ceguerra@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ringer, S.P., E-mail: simon.ringer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Cairney, J.M., E-mail: julie.cairney@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia)

    2015-03-15

    The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms.

  18. A Comparison of Methods for Player Clustering via Behavioral Telemetry

    DEFF Research Database (Denmark)

    Drachen, Anders; Thurau, Christian; Sifa, Rafet;

    2013-01-01

    The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets...... can be exceptionally complex, with features recorded for a varying population of users over a temporal segment that can reach years in duration. Categorization of behaviors, whether through descriptive methods (e.g. segmentation) or unsupervised/supervised learning techniques, is valuable for finding...... patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Nonnegative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations...

  19. An Incremental Algorithm of Text Clustering Based on Semantic Sequences

    Institute of Scientific and Technical Information of China (English)

    FENG Zhonghui; SHEN Junyi; BAO Junpeng

    2006-01-01

    This paper proposed an incremental textclustering algorithm based on semantic sequence.Using similarity relation of semantic sequences and calculating the cover of similarity semantic sequences set, the candidate cluster with minimum entropy overlap value was selected as a result cluster every time in this algorithm.The comparison of experimental results shows that the precision of the algorithm is higher than other algorithms under same conditions and this is obvious especially on long documents set.

  20. Chemically induced morphology change in cluster-based nanostructures

    Science.gov (United States)

    Lando, A.; Kébaǧli, N.; Cahuzac, Ph.; Colliex, C.; Couillard, M.; Masson, A.; Schmidt, M.; Bréchignac, C.

    2007-07-01

    Preformed clusters carrying surfactant are used as primary blocks for the building of nano structures. Self assembly of silver atom based clusters, soft landed on a HOPG surface, generates a large variety of new architectures depending on the nature and on the concentration of the impurities. Fractal shapes fragmented into multiple compact like islands, and chain like structures might be formed. A strong local enhancement of the silver atom mobility at the surface of islands is responsible for those morphology changes.

  1. Clustering economies based on multiple criteria decision making techniques

    OpenAIRE

    2011-01-01

    One of the primary concerns on many countries is to determine different important factors affecting economic growth. In this paper, we study some factors such as unemployment rate, inflation ratio, population growth, average annual income, etc to cluster different countries. The proposed model of this paper uses analytical hierarchy process (AHP) to prioritize the criteria and then uses a K-mean technique to cluster 59 countries based on the ranked criteria into four groups. The first group i...

  2. NCUBE - A clustering algorithm based on a discretized data space

    Science.gov (United States)

    Eigen, D. J.; Northouse, R. A.

    1974-01-01

    Cluster analysis involves the unsupervised grouping of data. The process provides an automatic procedure for generating known training samples for pattern classification. NCUBE, the clustering algorithm presented, is based upon the concept of imposing a gridwork on the data space. The NCUBE computer implementation of this concept provides an easily derived form of piecewise linear discrimination. This piecewise linear discrimination permits the separation of some types of data groups that are not linearly separable.

  3. Neuro-fuzzy system modeling based on automatic fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    Yuangang TANG; Fuchun SUN; Zengqi SUN

    2005-01-01

    A neuro-fuzzy system model based on automatic fuzzy clustering is proposed.A hybrid model identification algorithm is also developed to decide the model structure and model parameters.The algorithm mainly includes three parts:1) Automatic fuzzy C-means (AFCM),which is applied to generate fuzzy rules automatically,and then fix on the size of the neuro-fuzzy network,by which the complexity of system design is reducesd greatly at the price of the fitting capability;2) Recursive least square estimation (RLSE).It is used to update the parameters of Takagi-Sugeno model,which is employed to describe the behavior of the system;3) Gradient descent algorithm is also proposed for the fuzzy values according to the back propagation algorithm of neural network.Finally,modeling the dynamical equation of the two-link manipulator with the proposed approach is illustrated to validate the feasibility of the method.

  4. Rank Based Clustering For Document Retrieval From Biomedical Databases

    CERN Document Server

    Manicassamy, Jayanthi

    2009-01-01

    Now a day's, search engines are been most widely used for extracting information's from various resources throughout the world. Where, majority of searches lies in the field of biomedical for retrieving related documents from various biomedical databases. Currently search engines lacks in document clustering and representing relativeness level of documents extracted from the databases. In order to overcome these pitfalls a text based search engine have been developed for retrieving documents from Medline and PubMed biomedical databases. The search engine has incorporated page ranking bases clustering concept which automatically represents relativeness on clustering bases. Apart from this graph tree construction is made for representing the level of relatedness of the documents that are networked together. This advance functionality incorporation for biomedical document based search engine found to provide better results in reviewing related documents based on relativeness.

  5. Rank Based Clustering For Document Retrieval From Biomedical Databases

    Directory of Open Access Journals (Sweden)

    Jayanthi Manicassamy

    2009-09-01

    Full Text Available Now a day's, search engines are been most widely used for extracting information's from various resources throughout the world. Where, majority of searches lies in the field of biomedical for retrieving related documents from various biomedical databases. Currently search engines lacks in document clustering and representing relativeness level of documents extracted from the databases. In order to overcome these pitfalls a text based search engine have been developed for retrieving documents from Medline and PubMed biomedical databases. The search engine has incorporated page ranking bases clustering concept which automatically represents relativeness on clustering bases. Apart from this graph tree construction is made for representing the level of relatedness of the documents that are networked together. This advance functionality incorporation for biomedical document based search engine found to provide better results in reviewing related documents based on relativeness.

  6. Authentication Based on Multilayer Clustering in Ad Hoc Networks

    Directory of Open Access Journals (Sweden)

    Suh Heyi-Sook

    2005-01-01

    Full Text Available In this paper, we describe a secure cluster-routing protocol based on a multilayer scheme in ad hoc networks. This work provides scalable, threshold authentication scheme in ad hoc networks. We present detailed security threats against ad hoc routing protocols, specifically examining cluster-based routing. Our proposed protocol, called "authentication based on multilayer clustering for ad hoc networks" (AMCAN, designs an end-to-end authentication protocol that relies on mutual trust between nodes in other clusters. The AMCAN strategy takes advantage of a multilayer architecture that is designed for an authentication protocol in a cluster head (CH using a new concept of control cluster head (CCH scheme. We propose an authentication protocol that uses certificates containing an asymmetric key and a multilayer architecture so that the CCH is achieved using the threshold scheme, thereby reducing the computational overhead and successfully defeating all identified attacks. We also use a more extensive area, such as a CCH, using an identification protocol to build a highly secure, highly available authentication service, which forms the core of our security framework.

  7. Methods of regional innovative clusters forming and development programs elaboration

    OpenAIRE

    Marchuk, Olha

    2013-01-01

    The aim of the article is to select programmes for the formation and development of innovative cluster structures. The analysis of the backgrounds of formation of innovative clusters was made in the regions of Ukraine. Two types of programmes were suggested for the implamentation of cluster policy at the regional level.

  8. DSN Beowulf Cluster-Based VLBI Correlator

    Science.gov (United States)

    Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.

    2009-01-01

    The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database

  9. An ant colony based resilience approach to cascading failures in cluster supply network

    Science.gov (United States)

    Wang, Yingcong; Xiao, Renbin

    2016-11-01

    Cluster supply chain network is a typical complex network and easily suffers cascading failures under disruption events, which is caused by the under-load of enterprises. Improving network resilience can increase the ability of recovery from cascading failures. Social resilience is found in ant colony and comes from ant's spatial fidelity zones (SFZ). Starting from the under-load failures, this paper proposes a resilience method to cascading failures in cluster supply chain network by leveraging on social resilience of ant colony. First, the mapping between ant colony SFZ and cluster supply chain network SFZ is presented. Second, a new cascading model for cluster supply chain network is constructed based on under-load failures. Then, the SFZ-based resilience method and index to cascading failures are developed according to ant colony's social resilience. Finally, a numerical simulation and a case study are used to verify the validity of the cascading model and the resilience method. Experimental results show that, the cluster supply chain network becomes resilient to cascading failures under the SFZ-based resilience method, and the cluster supply chain network resilience can be enhanced by improving the ability of enterprises to recover and adjust.

  10. Clustered iterative stochastic ensemble method for multi-modal calibration of subsurface flow models

    KAUST Repository

    Elsheikh, Ahmed H.

    2013-05-01

    A novel multi-modal parameter estimation algorithm is introduced. Parameter estimation is an ill-posed inverse problem that might admit many different solutions. This is attributed to the limited amount of measured data used to constrain the inverse problem. The proposed multi-modal model calibration algorithm uses an iterative stochastic ensemble method (ISEM) for parameter estimation. ISEM employs an ensemble of directional derivatives within a Gauss-Newton iteration for nonlinear parameter estimation. ISEM is augmented with a clustering step based on k-means algorithm to form sub-ensembles. These sub-ensembles are used to explore different parts of the search space. Clusters are updated at regular intervals of the algorithm to allow merging of close clusters approaching the same local minima. Numerical testing demonstrates the potential of the proposed algorithm in dealing with multi-modal nonlinear parameter estimation for subsurface flow models. © 2013 Elsevier B.V.

  11. A method for clustering of miRNA sequences using fragmented programming

    Science.gov (United States)

    Ivashchenko, Anatoly; Pyrkova, Anna; Niyazova, Raigul

    2016-01-01

    Clustering of miRNA sequences is an important problem in molecular genetics associated cellular biology. Thousands of such sequences are known today through advancement in sophisticated molecular tools, sequencing techniques, computational resources and rule based mathematical models. Analysis of such large-scale miRNA sequences for inferring patterns towards deducing cellular function is a great challenge in modern molecular biology. Therefore, it is of interest to develop mathematical models specific for miRNA sequences. The process is to group (cluster) such miRNA sequences using well-defined known features. We describe a method for clustering of miRNA sequences using fragmented programming. Subsequently, we illustrated the utility of the model using a dendrogram (a tree diagram) for publically known A.thaliana miRNA nucleotide sequences towards the inference of observed conserved patterns PMID:27212839

  12. Development Strategy Research of Industrial Clusters based on SWOT Analysis Method---To Pingxiang Bicycle Industrial Clusters for Example%基于SWOT分析的产业集群发展战略研究--以平乡自行车出口产业为例

    Institute of Scientific and Technical Information of China (English)

    杨金廷; 高敬

    2013-01-01

    SWOT分析方法在战略研究中应用较广,根据研究主体自身的既定内在条件进行分析,产业集群作为区域经济的载体,通过形成强劲、持续竞争力的能力,具有明显的发展优势。基于SWOT客观分析了平乡县自行车集群产业面临的外部机遇和威胁、内部优势和劣势,论述了SO、 WO、 ST、 WT不同组合下对应的产业发展策略,指出SO组合下应发展市场开发战略、品牌化战略、产品开发战略; WO组合下挖掘知识产权战略,以创新促发展,提升产品附加值,适应消费群体对高品质的需求; ST策略下应运用行业协会协同策略,通过行业协同形式共享资源,将企业联合起来共同打造平乡自行车成功模式; WT策略下提升产业集聚升级战略,改变平乡县目前小作坊、涣散管理的现状,优化产业资源配置。总之,充分发挥集群效应,为产业集群发展提供借鉴意义。%SWOT analysis method is used generally in strategic research. It analyses the established internal condition accor-ding to the research subject itself. As the carrier of the regional economic, by forming strong, sustained competitive ability, industrial clusters have great advantages of development. We analyze Opportunities, Threatens, Strengths and Weaknesses of Pingxiang bicycle cluster industry based on SWOT analysis method. We discussed four development strategies about SO, WO, ST, WT and pointed out that we should develop market development strategy, brand strategy, product development strategy under SO combination. Intellectual property strategy under WO combination , to enhance value-added products and adapt to the demand for high quality of consumers with the development of global science and technology progress and to develop innovation. The industry associations collaborative strategy under ST combination to build a successful model for Pingxiang bicycle cluster industry. To form a

  13. A Novel Clustering-Based Feature Representation for the Classification of Hyperspectral Imagery

    Directory of Open Access Journals (Sweden)

    Qikai Lu

    2014-06-01

    Full Text Available In this study, a new clustering-based feature extraction algorithm is proposed for the spectral-spatial classification of hyperspectral imagery. The clustering approach is able to group the high-dimensional data into a subspace by mining the salient information and suppressing the redundant information. In this way, the relationship between neighboring pixels, which was hidden in the original data, can be extracted more effectively. Specifically, in the proposed algorithm, a two-step process is adopted to make use of the clustering-based information. A clustering approach is first used to produce the initial clustering map, and, subsequently, a multiscale cluster histogram (MCH is proposed to represent the spatial information around each pixel. In order to evaluate the robustness of the proposed MCH, four clustering techniques are employed to analyze the influence of the clustering methods. Meanwhile, the performance of the MCH is compared to three other widely used spatial features: the gray-level co-occurrence matrix (GLCM, the 3D wavelet texture, and differential morphological profiles (DMPs. The experiments conducted on four well-known hyperspectral datasets verify that the proposed MCH can significantly improve the classification accuracy, and it outperforms other commonly used spatial features.

  14. Internet2-based 3D PET image reconstruction using a PC cluster.

    Science.gov (United States)

    Shattuck, D W; Rapela, J; Asma, E; Chatzioannou, A; Qi, J; Leahy, R M

    2002-08-07

    We describe an approach to fast iterative reconstruction from fully three-dimensional (3D) PET data using a network of PentiumIII PCs configured as a Beowulf cluster. To facilitate the use of this system, we have developed a browser-based interface using Java. The system compresses PET data on the user's machine, sends these data over a network, and instructs the PC cluster to reconstruct the image. The cluster implements a parallelized version of our preconditioned conjugate gradient method for fully 3D MAP image reconstruction. We report on the speed-up factors using the Beowulf approach and the impacts of communication latencies in the local cluster network and the network connection between the user's machine and our PC cluster.

  15. A fuzzy-clustering analysis based phonetic tied-mixture HMM

    Institute of Scientific and Technical Information of China (English)

    XU Xianghua; ZHU Jie; GUO Qiang

    2005-01-01

    To efficiently decrease the size of parameters and improve the robustness of parameters training, a fuzzy clustering based phonetic tied-mixture model, FPTM, is presented.The Gaussian codebook of FPTM is synthesized from Gaussian components belonging to the same root node in phonetic decision tree. Fuzzy clustering method is further used for FPTM covariance sharing. Experimental results show that compared with the conventional PTM with approximately the same parameters size, FPTM decrease the size of Gaussian weights by 77.59% and increases word accuracy by 7.92%, which proves Gaussian fuzzy clustering is efficient. Compared with FPTM, covariance-shared FPTM decreases word error rate by 1.14% , which proves the combined fuzzy clustering for both Gaussian and covariance is superior to Gaussian fuzzy clustering alone.

  16. Multi-hop routing-based optimization of the number of cluster-heads in wireless sensor networks.

    Science.gov (United States)

    Nam, Choon Sung; Han, Young Shin; Shin, Dong Ryeol

    2011-01-01

    Wireless sensor networks require energy-efficient data transmission because the sensor nodes have limited power. A cluster-based routing method is more energy-efficient than a flat routing method as it can only send specific data for user requirements and aggregate similar data by dividing a network into a local cluster. However, previous clustering algorithms have some problems in that the transmission radius of sensor nodes is not realistic and multi-hop based communication is not used both inside and outside local clusters. As energy consumption based on clustering is dependent on the number of clusters, we need to know how many clusters are best. Thus, we propose an optimal number of cluster-heads based on multi-hop routing in wireless sensor networks. We observe that a local cluster made by a cluster-head influences the energy consumption of sensor nodes. We determined an equation for the number of packets to send and relay, and calculated the energy consumption of sensor networks using it. Through the process of calculating the energy consumption, we can obtain the optimal number of cluster-heads in wireless sensor networks.

  17. Combined Density-based and Constraint-based Algorithm for Clustering

    Institute of Scientific and Technical Information of China (English)

    CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing

    2006-01-01

    We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.

  18. A Data Cleansing Method for Clustering Large-scale Transaction Databases

    CERN Document Server

    Loh, Woong-Kee; Kang, Jun-Gyu

    2010-01-01

    In this paper, we emphasize the need for data cleansing when clustering large-scale transaction databases and propose a new data cleansing method that improves clustering quality and performance. We evaluate our data cleansing method through a series of experiments. As a result, the clustering quality and performance were significantly improved by up to 165% and 330%, respectively.

  19. Timing-Driven Nonuniform Depopulation-Based Clustering

    Directory of Open Access Journals (Sweden)

    Hanyu Liu

    2010-01-01

    hence improve routability by spreading the logic over the architecture. However, all depopulation-based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven nonuniform depopulation-based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. T-NDPack adjusts the CLB capacity based on the criticality of the Basic Logic Element (BLE. Results show that T-NDPack reduces minimum channel width by 11.07% while increasing the number of CLBs by 13.28% compared to T-VPack. More importantly, T-NDPack decreases critical path delay by 2.89%.

  20. ANONYMIZATION BASED ON NESTED CLUSTERING FOR PRIVACY PRESERVATION IN DATA MINING

    Directory of Open Access Journals (Sweden)

    V.Rajalakshmi

    2013-07-01

    Full Text Available Privacy Preservation in data mining protects the data from revealing unauthorized extraction of information. Data Anonymization techniques implement this by modifying the data, so that the original values cannot be acquired easily. Perturbation techniques are variedly used which will greatly affect the quality of data,since there is a trade-off between privacy preservation and information loss which will subsequently affect the result of data mining. The method that is proposed in this paper is based on nested clustering of data andperturbation on each cluster. The size of clusters is kept optimal to reduce the information loss. The paper explains the methodology, implementation and results of nested clustering. Various metrics are also provided to explicate that this method overcomes the disadvantages of other perturbation methods.

  1. A Data-origin Authentication Protocol Based on ONOS Cluster

    Directory of Open Access Journals (Sweden)

    Qin Hua

    2016-01-01

    Full Text Available This paper is aim to propose a data-origin authentication protocol based on ONOS cluster. ONOS is a SDN controller which can work under a distributed environment. However, the security of an ONOS cluster is seldom considered, and the communication in an ONOS cluster may suffer from lots of security threats. In this paper, we used a two-tier self-renewable hash chain for identity authentication and data-origin authentication. We analyse the security and overhead of our proposal and made a comparison with current security measure. It showed that with the help of our proposal, communication in an ONOS cluster could be protected from identity forging, replay attacks, data tampering, MITM attacks and repudiation, also the computational overhead would decrease apparently.

  2. Taste Identification of Tea Through a Fuzzy Neural Network Based on Fuzzy C-means Clustering

    Institute of Scientific and Technical Information of China (English)

    ZHENG Yan; ZHOU Chun-guang

    2003-01-01

    In this paper, we present a fuzzy neural network model based on Fuzzy C-Means (FCM) clustering algorithm to realize the taste identification of tea. The proposed method can acquire the fuzzy subset and its membership function in an automatic way with the aid of FCM clustering algorithm. Moreover, we improve the fuzzy weighted inference approach. The proposed model is illustrated with the simulation of taste identification of tea.

  3. Richness-based masses of rich and famous galaxy clusters

    CERN Document Server

    Andreon, S

    2016-01-01

    We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate ($0.16$ dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at $0.05clusters, that have a known X-ray emission, that are in the Abell catalog, or that are among the most most cited in the literature. Diagnostic plots and direct images of clusters are individually inspected and we improved cluster centers and, when needed, we revised redshifts. Whenever possible, we also checked for indications of contamination from other clus...

  4. Mapping the Generator Coordinate Method to the Coupled Cluster Approach

    CERN Document Server

    Stuber, Jason L

    2015-01-01

    The generator coordinate method (GCM) casts the wavefunction as an integral over a weighted set of non-orthogonal single determinantal states. In principle this representation can be used like the configuration interaction (CI) or shell model to systematically improve the approximate wavefunction towards an exact solution. In practice applications have generally been limited to systems with less than three degrees of freedom. This bottleneck is directly linked to the exponential computational expense associated with the numerical projection of broken symmetry Hartree-Fock (HF) or Hartree-Fock-Bogoliubov (HFB) wavefunctions and to the use of a variational rather than a bi-variational expression for the energy. We circumvent these issues by choosing a hole-particle representation for the generator and applying algebraic symmetry projection, via the use of tensor operators and the invariant mean (operator average). The resulting GCM formulation can be mapped directly to the coupled cluster (CC) approach, leading...

  5. Voxel-based clustered imaging by multiparameter diffusion tensor images for glioma grading.

    Science.gov (United States)

    Inano, Rika; Oishi, Naoya; Kunieda, Takeharu; Arakawa, Yoshiki; Yamao, Yukihiro; Shibata, Sumiya; Kikuchi, Takayuki; Fukuyama, Hidenao; Miyamoto, Susumu

    2014-01-01

    Gliomas are the most common intra-axial primary brain tumour; therefore, predicting glioma grade would influence therapeutic strategies. Although several methods based on single or multiple parameters from diagnostic images exist, a definitive method for pre-operatively determining glioma grade remains unknown. We aimed to develop an unsupervised method using multiple parameters from pre-operative diffusion tensor images for obtaining a clustered image that could enable visual grading of gliomas. Fourteen patients with low-grade gliomas and 19 with high-grade gliomas underwent diffusion tensor imaging and three-dimensional T1-weighted magnetic resonance imaging before tumour resection. Seven features including diffusion-weighted imaging, fractional anisotropy, first eigenvalue, second eigenvalue, third eigenvalue, mean diffusivity and raw T2 signal with no diffusion weighting, were extracted as multiple parameters from diffusion tensor imaging. We developed a two-level clustering approach for a self-organizing map followed by the K-means algorithm to enable unsupervised clustering of a large number of input vectors with the seven features for the whole brain. The vectors were grouped by the self-organizing map as protoclusters, which were classified into the smaller number of clusters by K-means to make a voxel-based diffusion tensor-based clustered image. Furthermore, we also determined if the diffusion tensor-based clustered image was really helpful for predicting pre-operative glioma grade in a supervised manner. The ratio of each class in the diffusion tensor-based clustered images was calculated from the regions of interest manually traced on the diffusion tensor imaging space, and the common logarithmic ratio scales were calculated. We then applied support vector machine as a classifier for distinguishing between low- and high-grade gliomas. Consequently, the sensitivity, specificity, accuracy and area under the curve of receiver operating characteristic

  6. Cluster analysis based on dimensional information with applications to feature selection and classification

    Science.gov (United States)

    Eigen, D. J.; Fromm, F. R.; Northouse, R. A.

    1974-01-01

    A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

  7. Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization

    Directory of Open Access Journals (Sweden)

    A. K. Santra

    2011-09-01

    Full Text Available An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases document clustering rate. The performance of the proposed work is measured in terms of cluster objects accuracy, term weight, term frequency and inverse document frequency. Experimental results demonstrate that it achieves very good performance on both feature selection and text document categorization, compared to other classifier methods.

  8. 基于位置社交网络中地点聚类推荐方法%The method of spot cluster recommendation in location-based social networks

    Institute of Scientific and Technical Information of China (English)

    李朔; 石宇良

    2016-01-01

    为解决基于位置社交网络中地点推荐时遇到的数据稀疏、冷启动问题,提出一种改进的地点推荐方法,在协同过滤算法的基础上融合了聚类算法,考虑到用户偏好、朋友关系、位置语义等因素,在推荐时取两种算法的优点进行互补。研究的重点是相似度的计算,包括兴趣地点相似度、好友亲密度、词频-逆文档频率、余弦相似性。在Foursquare数据集上以准确率、召回率、单个主题的平均准确率作为度量依据,对提出的方法进行验证。试验证明,本方法有效提高了推荐效果。%In order to solve the data sparse and cold start in spot recommendation in the location-based social networ-king,an improved spot recommendation method was proposed.Based on the clustering algorithm and the collaborative filtering algorithm,the user preferences,friend relations,semantic location and other factors was taken into account. The advantages of the two methods were complemented.The focus of this research was the calculation of similarity, which included location similarity,friends intimacy measure,term frequency inverse document frequency,cosine simi-larity.To verify the proposed methods,precision,recall,mean average precision was used as a measure on Foursquare dataset.The results showed that the proposed method could effectively improve the recommendation effect.

  9. 基于聚类的内容分类方法的研究与应用分析%Research and Application of Content Classification Method Based on Clustering

    Institute of Scientific and Technical Information of China (English)

    朱青; 牛志慧; 张晓凌

    2014-01-01

    内容管理作为内容的共享与沟通的工具,已经成为计算机领域的重要应用。随着信息的飞速增长,如何有效的对浩瀚的信息进行分类组织,成为内容管理的关键问题。对内容归类组织不仅有利于内容的快速查询,而且便于对内容的分类管理。文章研究了一种基于K-means聚类算法的内容分类方法,实现了海量内容的有效分类管理,并在此基础上进行了应用分析。%Content management as a tool of content sharing and communication, has become an important application in the field of computer. With the rapid growth of information, how to classify the vast information effectively, becomes the key problem of content management. The content classification and organization not only is beneficial to content search, but also facilitate the classification of content management. This paper studies a kind of classification method of K-means clustering algorithm based on content, which realize the effective classification management of massive content, and on the basis of this method we made the analysis of application in this paper.

  10. Factorial PD-Clustering

    CERN Document Server

    Tortora, Cristina; Summa, Mireille Gettler

    2011-01-01

    Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factorial PD-clustering make a linear transformation of original variables into a reduced number of orthogonal ones using a common criterion with PD-Clustering. It is demonstrated that Tucker 3 decomposition allows to obtain this transformation. Factorial PD-clustering makes alternatively a Tucker 3 decomposition and a PD-clustering on transformed data until convergence. This method could significantly improve the algorithm performance and allows to work with large dataset, to improve the stability and the robustness of the method.

  11. A Comprehensive Comparison of Different Clustering Methods for Reliability Analysis of Microarray Data

    Science.gov (United States)

    Kafieh, Rahele; Mehridehnavi, Alireza

    2013-01-01

    In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134

  12. Onto-clust--a methodology for combining clustering analysis and ontological methods for identifying groups of comorbidities for developmental disorders.

    Science.gov (United States)

    Peleg, Mor; Asbeh, Nuaman; Kuflik, Tsvi; Schertz, Mitchell

    2009-02-01

    Children with developmental disorders usually exhibit multiple developmental problems (comorbidities). Hence, such diagnosis needs to revolve on developmental disorder groups. Our objective is to systematically identify developmental disorder groups and represent them in an ontology. We developed a methodology that combines two methods (1) a literature-based ontology that we created, which represents developmental disorders and potential developmental disorder groups, and (2) clustering for detecting comorbid developmental disorders in patient data. The ontology is used to interpret and improve clustering results and the clustering results are used to validate the ontology and suggest directions for its development. We evaluated our methodology by applying it to data of 1175 patients from a child development clinic. We demonstrated that the ontology improves clustering results, bringing them closer to an expert generated gold-standard. We have shown that our methodology successfully combines an ontology with a clustering method to support systematic identification and representation of developmental disorder groups.

  13. Sleeping Cluster based Medium Access Control Layer Routing Protocol for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    T. R. Rangaswamy

    2012-01-01

    Full Text Available Wireless sensor networks play a vital role in remote area applications, where human intervention is not possible. In a Wireless Sensor Network (WSN each and every node is strictly an energy as well as bandwidth constrained one. Problem statement: In a standard WSN, most of the routing techniques, move data from multiple sources to a single fixed base station. Because of the greater number of computational tasks, the existing routing protocol did not address the energy efficient problem properly. In order to overcome the problem of energy consumption due to more number of computational tasks, a new method is developed. Approach: The proposed algorithm divides the sensing field into three active clusters and one sleeping cluster. The cluster head selection is based on the distance between the base station and the normal nodes. The Time Division Multiple Access (TDMA mechanism is used to make the cluster remain in the active state as well as the sleeping state. In an active cluster 50% of nodes will be made active and the remaining 50% be in sleep state. A sleeping cluster will be made active after a period of time and periodically changes its functionality. Results: Due to this periodic change of state, energy consumption is minimized. The performance of the Low Energy Adaptive and Clustering Hierarchy (LEACH algorithm is also analyzed, using a network simulator NS2 based on the number of Cluster Heads (CH, Energy consumption, Lifetime and the number of nodes alive. Conclusion: The simulation studies were carried out using a network simulation tool NS2, for the proposed method and this is compared with the performance of the existing protocol. The superiority of the proposed method is highlighted.

  14. Cluster Evolution in Undercooled Melt and Solidification of Undercooled Ge-based Alloy Melts Induced by Extrinsic Clusters

    Institute of Scientific and Technical Information of China (English)

    王煦; 景勤; 王文魁

    2003-01-01

    The structure or short-range order of clusters in undercooled metallic melts is influenced, to some extent, by the interfacial free energy between the cluster and the melt. Analyses of the effects of interfacial energy on the cluster structure based on the Gibbs equation show a possibility that atoms in the clusters tend to be packed more loosely with the increasing cluster size (or the undercooling). Nucleation may occur, following these analyses,when clusters reach a definite size and atoms in the clusters relax to some extent to form the crystal structure.Indirect support to this viewpoint is provided by the present results of cluster-induced nucleation experiments on undercooled Ge73.7Ni26.3 alloy melts.

  15. Hybrid Parallel Bidirectional Sieve based on SMP Cluster

    CERN Document Server

    Liao, Gang; Liu, Lei

    2012-01-01

    In this article, hybrid parallel bidirectional sieve method is implemented by SMP Cluster, the individual computational units joined together by the communication network, are usually shared-memory systems with one or more multicore processor. To high-efficiency optimization, we propose average divide data into nodes, generating double-ended queues (deque) for sieve method that are able to exploit dual-cores simultaneously start sifting out primes from the head and tail.And each node create a FIFO queue as dynamic data buffer to ache temporary data from another nodes send to. The approach obtains huge speedup and efficiency on SMP Cluster.

  16. Applying clustering approach in predictive uncertainty estimation: a case study with the UNEEC method

    Science.gov (United States)

    Dogulu, Nilay; Solomatine, Dimitri; Lal Shrestha, Durga

    2014-05-01

    Within the context of flood forecasting, assessment of predictive uncertainty has become a necessity for most of the modelling studies in operational hydrology. There are several uncertainty analysis and/or prediction methods available in the literature; however, most of them rely on normality and homoscedasticity assumptions for model residuals occurring in reproducing the observed data. This study focuses on a statistical method analyzing model residuals without having any assumptions and based on a clustering approach: Uncertainty Estimation based on local Errors and Clustering (UNEEC). The aim of this work is to provide a comprehensive evaluation of the UNEEC method's performance in view of clustering approach employed within its methodology. This is done by analyzing normality of model residuals and comparing uncertainty analysis results (for 50% and 90% confidence level) with those obtained from uniform interval and quantile regression methods. An important part of the basis by which the methods are compared is analysis of data clusters representing different hydrometeorological conditions. The validation measures used are PICP, MPI, ARIL and NUE where necessary. A new validation measure linking prediction interval to the (hydrological) model quality - weighted mean prediction interval (WMPI) - is also proposed for comparing the methods more effectively. The case study is Brue catchment, located in the South West of England. A different parametrization of the method than its previous application in Shrestha and Solomatine (2008) is used, i.e. past error values in addition to discharge and effective rainfall is considered. The results show that UNEEC's notable characteristic in its methodology, i.e. applying clustering to data of predictors upon which catchment behaviour information is encapsulated, contributes increased accuracy of the method's results for varying flow conditions. Besides, classifying data so that extreme flow events are individually

  17. A rough set based rational clustering framework for determining correlated genes.

    Science.gov (United States)

    Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy

    2016-06-01

    Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters.

  18. Methods for accurate analysis of galaxy clustering on non-linear scales

    Science.gov (United States)

    Vakili, Mohammadjavad

    2017-01-01

    Measurements of galaxy clustering with the low-redshift galaxy surveys provide sensitive probe of cosmology and growth of structure. Parameter inference with galaxy clustering relies on computation of likelihood functions which requires estimation of the covariance matrix of the observables used in our analyses. Therefore, accurate estimation of the covariance matrices serves as one of the key ingredients in precise cosmological parameter inference. This requires generation of a large number of independent galaxy mock catalogs that accurately describe the statistical distribution of galaxies in a wide range of physical scales. We present a fast method based on low-resolution N-body simulations and approximate galaxy biasing technique for generating mock catalogs. Using a reference catalog that was created using the high resolution Big-MultiDark N-body simulation, we show that our method is able to produce catalogs that describe galaxy clustering at a percentage-level accuracy down to highly non-linear scales in both real-space and redshift-space.In most large-scale structure analyses, modeling of galaxy bias on non-linear scales is performed assuming a halo model. Clustering of dark matter halos has been shown to depend on halo properties beyond mass such as halo concentration, a phenomenon referred to as assembly bias. Standard large-scale structure studies assume that halo mass alone is sufficient in characterizing the connection between galaxies and halos. However, modeling of galaxy bias can face systematic effects if the number of galaxies are correlated with other halo properties. Using the Small MultiDark-Planck high resolution N-body simulation and the clustering measurements of Sloan Digital Sky Survey DR7 main galaxy sample, we investigate the extent to which the dependence of galaxy bias on halo concentration can improve our modeling of galaxy clustering.

  19. Hierarchical Compressed Sensing for Cluster Based Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Vishal Krishna Singh

    2016-02-01

    Full Text Available Data transmission consumes significant amount of energy in large scale wireless sensor networks (WSNs. In such an environment, reducing the in-network communication and distributing the load evenly over the network can reduce the overall energy consumption and maximize the network lifetime significantly. In this work, the aforementioned problem of network lifetime and uneven energy consumption in large scale wireless sensor networks is addressed. This work proposes a hierarchical compressed sensing (HCS scheme to reduce the in-network communication during the data gathering process. Co-related sensor readings are collected via a hierarchical clustering scheme. A compressed sensing (CS based data processing scheme is devised to transmit the data from the source to the sink. The proposed HCS is able to identify the optimal position for the application of CS to achieve reduced and similar number of transmissions on all the nodes in the network. An activity map is generated to validate the reduced and uniformly distributed communication load of the WSN. Based on the number of transmissions per data gathering round, the bit-hop metric model is used to analyse the overall energy consumption. Simulation results validate the efficiency of the proposed method over the existing CS based approaches.

  20. Optimization method of multi-distribution center location based on fuzzy clustering algorithm%基于模糊聚类算法的多配送中心选址优化方法

    Institute of Scientific and Technical Information of China (English)

    毛海军; 王勇; 杭文; 于航; 何杰

    2012-01-01

    To optimize multi-distribution center location operation in two-level facilities logistics network, the main influential factors on locating distribution centers are extracted, and a comprehensive evaluation index system is set up. Firstly, the linguistic variables are represented by triangular fuzzy number to implement a comprehensive evaluation for candidate distribution centers. Secondly, the interval number priority degree function method is adopted to integrate the criteria index into the first criteria index, and the integrated project evaluation index value is used as the input of fuzzy clustering algorithm for clustering operation. The clustering validity index is designed to analyze the rationality of clustering results. Finally, the technique for order preference by similarity to ideal solution (TOPSIS) method is used to rank the candidate distribution centers within the clustering unit, and the locations and quantities of distribution centers are determined. The results of an application example show that when the membership function value is 0.740 2, the clustering validity index gets the smallest value 2.43. The operation can divide the candidate distribution centers into four clusters and select the distribution center location in each cluster, making the location results reasonable and more advantageous than other methods. Therefore, the proposed method is more effective in addressing multi-distribution center location problem.%为了优化二级设施物流网络中多配送中心的选址操作,提取了影响配送中心选址的主要因素,建立了一种综合评价指标体系.首先,将语言变量值用三角模糊数表示,对备选配送中心进行综合评价;然后,采用区间数优度函数法将二级准则指标集成到一级准则指标上,以集成后的方案评价指标值作为模糊聚类算法的输入进行聚类操作,并设计了聚类有效性指标以用于判断聚类结果合理性;最后,应用TOPSIS方法对各

  1. AN APPLICATION OF HYBRID CLUSTERING AND NEURAL BASED PREDICTION MODELLING FOR DELINEATION OF MANAGEMENT ZONES

    Directory of Open Access Journals (Sweden)

    Babankumar S. Bansod

    2011-02-01

    Full Text Available Starting from descriptive data on crop yield and various other properties, the aim of this study is to reveal the trends on soil behaviour, such as crop yield. This study has been carried out by developing web application that uses a well known technique- Cluster Analysis. The cluster analysis revealed linkages between soil classes for the same field as well as between different fields, which can be partly assigned to crops rotation and determination of variable soil input rates. A hybrid clustering algorithm has been developed taking into account the traits of two clustering technologies: i Hierarchical clustering, ii K-means clustering. This hybrid clustering algorithm is applied to sensor- gathered data about soil and analysed, resulting in the formation of well delineatedmanagement zones based on various properties of soil, such as, ECa , crop yield, etc. One of the purposes of the study was to identify the main factors affecting the crop yield and the results obtained were validated with existing techniques. To accomplish this purpose, geo-referenced soil information has been examined. Also, based on this data, statistical method has been used to classify and characterize the soil behaviour. This is done using a prediction model, developed to predict the unknown behaviour of clusters based on the known behaviour of other clusters. In predictive modeling, data has been collected for the relevant predictors, a statistical model has been formulated, predictions were made and the model can be validated (or revised as additional data becomes available. The model used in the web application has been formed taking into account neural network based minimum hamming distance criterion.

  2. Excited state dynamics of Kr N clusters probed with time- and energy-resolved photoluminescence methods

    Science.gov (United States)

    Karnbach, R.; Castex, M. C.; Keto, J. W.; Joppien, M.; Wörmer, J.; Zimmerer, G.; Möller, T.

    1993-02-01

    Excitation and decay processes in Kr N clusters ( N=2-10 4) were investigated via time- and energy-resolved fluorescence methods with synchrotron radiation excitation. In small clusters ( N<50) in addition to the well-known emission bands of condensed Kr another broad continuous emission is observed. It is assigned to a radiative decay of Kr excimers desorbing from the cluster surface. There are indications that the cluster size where the desorption rate becomes slow is related to a change in sign of the electron affinity of the cluster. Changes of spectral distribution of the fluorescence light with cluster size are interpreted as variations of the vibrational energy flow.

  3. A fast SVM training algorithm based on the set segmentation and k-means clustering

    Institute of Scientific and Technical Information of China (English)

    YANG Xiaowei; LIN Daying; HAO Zhifeng; LIANG Yanchun; LIU Guirong; HAN Xu

    2003-01-01

    At present, studies on training algorithms for support vector machines (SVM) are important issues in the field of machine learning. It is a challenging task to improve the efficiency of the algorithm without reducing the generalization performance of SVM. To face this challenge, a new SVM training algorithm based on the set segmentation and k-means clustering is presented in this paper. The new idea is to divide all the original training data into many subsets, followed by clustering each subset using k-means clustering and finally train SVM using the new data set obtained from clustering centroids. Considering that the decomposition algorithm such as SVMlight is one of the major methods for solving support vector machines, the SVMlight is used in our experiments. Simulations on different types of problems show that the proposed method can solve efficiently not only large linear classification problems but also large nonlinear ones.

  4. A THREE-STEP SPATIAL-TEMPORAL-SEMANTIC CLUSTERING METHOD FOR HUMAN ACTIVITY PATTERN ANALYSIS

    Directory of Open Access Journals (Sweden)

    W. Huang

    2016-06-01

    Full Text Available How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time to four dimensions (space, time and semantics. More specifically, not only a location and time that people stay and spend are collected, but also what people “say” for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The

  5. a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis

    Science.gov (United States)

    Huang, W.; Li, S.; Xu, S.

    2016-06-01

    How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the

  6. Evaluation of availability of cluster distributed disaster tolerant systems for control and information processing based on a cluster quorum

    Science.gov (United States)

    Tsarev, R. Yu; Gruzenkin, D. V.; Kovalev, I. V.; Prokopenko, A. V.; Knyazkov, A. N.

    2016-11-01

    Control and information processing systems, which often executes critical functions, must satisfy requirements not only of fault tolerance, but also of disaster tolerance. Cluster architecture is reasonable to be applied to provide disaster tolerance of these systems. In this case clusters are separate control and information processing centers united by means of communication channels. Thus, clusters are a single hardware resource interacting with each other to achieve system objectives. Remote cluster positioning allows ensuring system availability and disaster tolerance even in case of some units’ failures or a whole cluster crash. A technique for evaluation of availability of cluster distributed systems for control and information processing based on a cluster quorum is presented in the paper. This technique can be applied to different cluster distributed control and information processing systems, claimed to be based on the disaster tolerance principles. In the article we discuss a communications satellite system as an example of a cluster distributed disaster tolerant control and information processing system. Evaluation of availability of the communications satellite system is provided. Possible scenarios of communications satellite system cluster-based components failures were analyzed. The analysis made it possible to choose the best way to implement the cluster structure for a distributed control and information processing system.

  7. Richness-based masses of rich and famous galaxy clusters

    Science.gov (United States)

    Andreon, S.

    2016-03-01

    We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate (0.16 dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at 0.05 web front-end is available at the URL http://www.brera.mi.astro.it/~andreon/famous.html

  8. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number

    Directory of Open Access Journals (Sweden)

    Cooper James B

    2010-03-01

    Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.

  9. Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks †

    Science.gov (United States)

    Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao

    2017-01-01

    Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs’ demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays. PMID:28098750

  10. Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Zhaowei Wang

    2017-01-01

    Full Text Available Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs, and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs’ demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays.

  11. Cluster-Based Maximum Consensus Time Synchronization for Industrial Wireless Sensor Networks.

    Science.gov (United States)

    Wang, Zhaowei; Zeng, Peng; Zhou, Mingtuo; Li, Dong; Wang, Jintao

    2017-01-13

    Time synchronization is one of the key technologies in Industrial Wireless Sensor Networks (IWSNs), and clustering is widely used in WSNs for data fusion and information collection to reduce redundant data and communication overhead. Considering IWSNs' demand for low energy consumption, fast convergence, and robustness, this paper presents a novel Cluster-based Maximum consensus Time Synchronization (CMTS) method. It consists of two parts: intra-cluster time synchronization and inter-cluster time synchronization. Based on the theory of distributed consensus, the proposed method utilizes the maximum consensus approach to realize the intra-cluster time synchronization, and adjacent clusters exchange the time messages via overlapping nodes to synchronize with each other. A Revised-CMTS is further proposed to counteract the impact of bounded communication delays between two connected nodes, because the traditional stochastic models of the communication delays would distort in a dynamic environment. The simulation results show that our method reduces the communication overhead and improves the convergence rate in comparison to existing works, as well as adapting to the uncertain bounded communication delays.

  12. Investigation of open clusters based on IPHAS and APASS survey data

    Science.gov (United States)

    Dambis, A. K.; Glushkova, E. V.; Berdnikov, L. N.; Joshi, Y. C.; Pandey, A. K.

    2017-02-01

    We adapt the classical Q-method based on a reddening-free parameter constructed from three passband magnitudes to the filter set of Isaac Newton Telescope Photometric Hα Survey and combine it with the maximum-likelihood-based cluster parameter estimator by Naylor & Jeffries (2006) to determine the extinction, heliocentric distances, and ages of young open clusters using Hαri data. The method is also adapted for the case of significant variations of extinction across the cluster field. Our technique is validated by comparing the colour excesses, distances, and ages determined in this study with the most bona fide values reported for the 18 well-studied young open clusters in the past and a fairly good agreement is found between our extinction and distance estimates and earlier published results, although our age estimates are not very consistent with those published by other authors. We also show that individual extinction values can be determined rather accurately for stars with (r - i) > 0.1. Our results open up a prospect for determining a uniform set of parameters for northern clusters based on homogeneous photometric data, and for searching for new, hitherto undiscovered open clusters.

  13. Efficient Clustering for Irregular Geometries Based on Identification of Concavities

    Directory of Open Access Journals (Sweden)

    Velázquez-Villegas Fernando

    2014-04-01

    Full Text Available Two dimensional clustering problem has much relevance in applications related to the efficient use of raw material, such as cutting stock, packing, etc. This is a very complex problem in which multiple bodies are accommodated efficiently in a way that they occupy as little space as possible. The complexity of the problem increases with the complexity of the bodies. Clearly the number of possible arrangements between bodies is huge. No Fit Polygon (NFP allows to determine the entire relative positions between two patterns (regular or irregular in contact, non-overlapping, therefore the best position can be selected. However, NFP generation requires a lot of calculations; besides, selecting the best cluster isn’t a simple task because, between two irregular patterns in contact, hollows (unusable areas and external concavities (usable areas can be produced. This work presents a quick and simple method to reduce calculations associated with NFP generation and to minimize unusable areas in a cluster. This method consists of generating partial NFP, just on concave regions of the patterns, and selecting the best cluster using a total weighted efficiency, i.e. a weighted value of enclosure efficiency (ratio of occupied area on convex hull area and hollow efficiency (ratio of occupied area on cluster area. The proposed method produces similar results as those obtained by other methods; however the shape of the clusters obtained allows to accommodate more parts in similar spaces, which is a desirable result when it comes to optimizing the use of material. We present two examples to show the performance of the proposal.

  14. Risk Probability Estimating Based on Clustering

    DEFF Research Database (Denmark)

    Chen, Yong; Jensen, Christian D.; Gray, Elizabeth;

    2003-01-01

    from the insurance industry do not directly apply to ubiquitous computing environments. Instead, we propose a dynamic mechanism for risk assessment, which is based on pattern matching, classification and prediction procedures. This mechanism uses an estimator of risk probability, which is based......biquitous computing environments are highly dynamic, with new unforeseen circumstances and constantly changing environments, which introduces new risks that cannot be assessed through traditional means of risk analysis. Mobile entities in a ubiquitous computing environment require the ability...... to perform an autonomous assessment of the risk incurred by a specific interaction with another entity in a given context. This assessment will allow a mobile entity to decide whether sufficient evidence exists to mitigate the risk and allow the interaction to proceed. Such evidence might include records...

  15. Hybrid Weighted-based Clustering Routing Protocol for Railway Communications

    Directory of Open Access Journals (Sweden)

    Jianli Xie

    2013-12-01

    Full Text Available In the paper, a hybrid clustering routing strategy is proposed for railway emergency ad hoc network, when GSM-R base stations are destroyed or some terminals (or nodes are far from the signal coverage. In this case, the cluster-head (CH election procedure is invoked on-demand, which takes into consideration the degree difference from the ideal degree, relative clustering stability, the sum of distance between the node and it’s one-hop neighbors, consumed power, node type and node mobility. For the clustering forming, the weights for the CH election parameters are allocated rationally by rough set theory. The hybrid weighted-based clustering routing (HWBCR strategy is designed for railway emergency communication scene, which aims to get a good trade-off between the computation costs and performances. The simulation platform is constructed to evaluate the performance of our strategy in terms of the average end-to-end delay, packet loss ratio, routing overhead and average throughput. The results, by comparing with the railway communication QoS index, reveal that our strategy is suitable for transmitting dispatching voice and data between train and ground, when the train speed is less than 220km/h

  16. Spatial temporal clustering for hotspot using kulldorff scan statistic method (KSS): A case in Riau Province

    Science.gov (United States)

    Hudjimartsu, S. A.; Djatna, T.; Ambarwari, A.; Apriliantono

    2017-01-01

    The forest fires in Indonesia occurs frequently in the dry season. Almost all the causes of forest fires are caused by the human activity itself. The impact of forest fires is the loss of biodiversity, pollution hazard and harm the economy of surrounding communities. To prevent fires required the method, one of them with spatial temporal clustering. Spatial temporal clustering formed grouping data so that the results of these groupings can be used as initial information on fire prevention. To analyze the fires, used hotspot data as early indicator of fire spot. Hotspot data consists of spatial and temporal dimensions can be processed using the Spatial Temporal Clustering with Kulldorff Scan Statistic (KSS). The result of this research is to the effectiveness of KSS method to cluster spatial hotspot in a case within Riau Province and produces two types of clusters, most cluster and secondary cluster. This cluster can be used as an early fire warning information.

  17. Efficiency of a Multi-Reference Coupled Cluster method

    CERN Document Server

    Giner, Emmanuel; Scemama, Anthony; Malrieu, Jean Paul

    2015-01-01

    The multi-reference Coupled Cluster method first proposed by Meller et al (J. Chem. Phys. 1996) has been implemented and tested. Guess values of the amplitudes of the single and double excitations (the ${\\hat T}$ operator) on the top of the references are extracted from the knowledge of the coefficients of the Multi Reference Singles and Doubles Configuration Interaction (MRSDCI) matrix. The multiple parentage problem is solved by scaling these amplitudes on the interaction between the references and the Singles and Doubles. Then one proceeds to a dressing of the MRSDCI matrix under the effect of the Triples and Quadruples, the coefficients of which are estimated from the action of ${\\hat T}^2$. This dressing follows the logics of the intermediate effective Hamiltonian formalism. The dressed MRSDCI matrix is diagonalized and the process is iterated to convergence. The method is tested on a series of benchmark systems from Complete Active Spaces (CAS) involving 2 or 4 active electrons up to bond breakings. The...

  18. Improving Cluster Analysis with Automatic Variable Selection Based on Trees

    Science.gov (United States)

    2014-12-01

    ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES by Anton D. Orr December 2014 Thesis Advisor: Samuel E. Buttrey Second Reader...DATES COVERED Master’s Thesis 4. TITLE AND SUBTITLE IMPROVING CLUSTER ANALYSIS WITH AUTOMATIC VARIABLE SELECTION BASED ON TREES 5. FUNDING NUMBERS 6...2006 based on classification and regression trees to address problems with determining dissimilarity. Current algorithms do not simultaneously address

  19. D Partition-Based Clustering for Supply Chain Data Management

    Science.gov (United States)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2015-10-01

    Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.

  20. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials

    Science.gov (United States)

    Sanders, Elizabeth A.

    2011-01-01

    This paper explores multilevel modeling approaches for 2-group randomized experiments in which a treatment condition involving clusters of individuals is compared to a control condition involving only ungrouped individuals, otherwise known as partially nested cluster randomized designs (PNCRTs). Strategies for comparing groups from a PNCRT in the…

  1. Structural variation from heterometallic cluster-based 1D chain to heterometallic tetranuclear cluster: Syntheses, structures and magnetic properties

    Science.gov (United States)

    Zhang, Shu-Hua; Zhao, Ru-Xia; Li, He-Ping; Ge, Cheng-Min; Li, Gui; Huang, Qiu-Ping; Zou, Hua-Hong

    2014-08-01

    Using the solvothermal method, we present the comparative preparation of {[Co3Na(dmaep)3(ehbd)(N3)3]·DMF}n (1) and [Co2Na2(hmbd)4(N3)2(DMF)2] (2), where Hehbd is 3-ethoxy-2-hydroxy-benzaldehyde, Hhmbd is 3-methoxy-2-hydroxy-benzaldehyde, and Hdmaep is 2-dimethylaminomethyl-6-ethoxy-phenol, which was synthesized by an in-situ reaction. Complexes 1 and 2 were characterized by elemental analysis, IR spectroscopy, and X-ray single-crystal diffraction. Complex 1 is a novel heterometallic cluster-based 1-D chain and 2 is a heterometallic tetranuclear cluster. The {Co3IINa} and {Co2IINa2} cores display dominant ferromagnetic interaction from the nature of the binding modes through μ1,1,1-N3- (end-on, EO).

  2. Mining Representative Subset Based on Fuzzy Clustering

    Institute of Scientific and Technical Information of China (English)

    ZHOU Hongfang; FENG Boqin; L(U) Lintao

    2007-01-01

    Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative subset from massive data.To accelerate the speed of producing representative subset, an improved algorithm-ARSMA(accelerated representative subset mining algorithm) is advanced, which adopt combining putting forward with backward strategies. In this way, the performance of the algorithm is improved. Finally we make experiments on real datasets and evaluate the representative subset. The experiment shows that ARSMA algorithm is more excellent than RandomPick algorithm either on effectiveness or efficiency.

  3. Threshold selection for classification of MR brain images by clustering method

    Energy Technology Data Exchange (ETDEWEB)

    Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)

    2015-12-07

    Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.

  4. Health state evaluation of shield tunnel SHM using fuzzy cluster method

    Science.gov (United States)

    Zhou, Fa; Zhang, Wei; Sun, Ke; Shi, Bin

    2015-04-01

    Shield tunnel SHM is in the path of rapid development currently while massive monitoring data processing and quantitative health grading remain a real challenge, since multiple sensors belonging to different types are employed in SHM system. This paper addressed the fuzzy cluster method based on fuzzy equivalence relationship for the health evaluation of shield tunnel SHM. The method was optimized by exporting the FSV map to automatically generate the threshold value. A new holistic health score(HHS) was proposed and its effectiveness was validated by conducting a pilot test. A case study on Nanjing Yangtze River Tunnel was presented to apply this method. Three types of indicators, namely soil pressure, pore pressure and steel strain, were used to develop the evaluation set U. The clustering results were verified by analyzing the engineering geological conditions; the applicability and validity of the proposed method was also demonstrated. Besides, the advantage of multi-factor evaluation over single-factor model was discussed by using the proposed HHS. This investigation indicated the fuzzy cluster method and HHS is capable of characterizing the fuzziness of tunnel health, and it is beneficial to clarify the tunnel health evaluation uncertainties.

  5. Extensive regularization of the coupled cluster methods based on the generating functional formalism: Application to gas-phase benchmarks and to the SN2 reaction of CHCl3 and OH- in water

    Science.gov (United States)

    Kowalski, Karol; Valiev, Marat

    2009-12-01

    The recently introduced energy expansion based on the use of generating functional (GF) [K. Kowalski and P. D. Fan, J. Chem. Phys. 130, 084112 (2009)] provides a way of constructing size-consistent noniterative coupled cluster (CC) corrections in terms of moments of the CC equations. To take advantage of this expansion in a strongly interacting regime, the regularization of the cluster amplitudes is required in order to counteract the effect of excessive growth of the norm of the CC wave function. Although proven to be efficient, the previously discussed form of the regularization does not lead to rigorously size-consistent corrections. In this paper we address the issue of size-consistent regularization of the GF expansion by redefining the equations for the cluster amplitudes. The performance and basic features of proposed methodology are illustrated on several gas-phase benchmark systems. Moreover, the regularized GF approaches are combined with quantum mechanical molecular mechanics module and applied to describe the SN2 reaction of CHCl3 and OH- in aqueous solution.

  6. Clustering Based Feature Learning on Variable Stars

    CERN Document Server

    Mackenzie, Cristóbal; Protopapas, Pavlos

    2016-01-01

    The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our ...

  7. Cluster based parallel database management system for data intensive computing

    Institute of Scientific and Technical Information of China (English)

    Jianzhong LI; Wei ZHANG

    2009-01-01

    This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently sup-port data intensive computing in response to the rapid grow-ing in database size and the need of high performance ana-lyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinator-wrapper mechanism to support the integration of heteroge-neous information resources on the Internet, and the fault tol-erant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.

  8. Utility-guided Clustering-based Transaction Data Anonymization

    Directory of Open Access Journals (Sweden)

    Aris Gkoulalas-Divanis

    2012-04-01

    Full Text Available Transaction data about individuals are increasingly collected to support a plethora of applications, spanning from marketing to biomedical studies. Publishing these data is required by many organizations, but may result in privacy breaches, if an attacker exploits potentially identifying information to link individuals to their records in the published data. Algorithms that prevent this threat by transforming transaction data prior to their release have been proposed recently, but they may incur significant utility loss due to their inability to: (i accommodate a range of different privacy requirements that data owners often have, and (ii guarantee that the produced data will satisfy data owners’ utility requirements. To address this issue, we propose a novel clustering-based framework to anonymizing transaction data, which provides the basis for designing algorithms that better preserve data utility. Based on this framework, we develop two anonymization algorithms which explore a larger solution space than existing methods and can satisfy a wide range of privacy requirements. Additionally, the second algorithm allows the specification and enforcement of utility requirements, thereby ensuring that the anonymized data remain useful in intended tasks. Experiments with both benchmark and real medical datasets verify that our algorithms significantly outperform the current state-of-the-art algorithms in terms of data utility, while being comparable in terms of efficiency.

  9. Topic Modeling Based Image Clustering by Events in Social Media

    Directory of Open Access Journals (Sweden)

    Bin Xu

    2016-01-01

    Full Text Available Social event detection in large photo collections is very challenging and multimodal clustering is an effective methodology to deal with the problem. Geographic information is important in event detection. This paper proposed a topic model based approach to estimate the missing geographic information for photos. The approach utilizes a supervised multimodal topic model to estimate the joint distribution of time, geographic, content, and attached textual information. Then we annotate the missing geographic photos with a predicted geographic coordinate. Experimental results indicate that the clustering performance improved by annotated geographic information.

  10. Search Profiles Based on User to Cluster Similarity

    Directory of Open Access Journals (Sweden)

    Saša Bošnjak

    2009-06-01

    Full Text Available Privacy of web users' query search logs has, since the AOL dataset release few years ago, been treated as one of the central issues concerning privacy on the Internet. Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search while retaining high privacy-utility tradeoff, is examined in this paper. By using only the user`s cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  11. SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY

    Directory of Open Access Journals (Sweden)

    Ilija Subasic

    2007-12-01

    Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.

  12. Script identification from images using cluster-based templates

    Science.gov (United States)

    Hochberg, J.G.; Kelly, P.M.; Thomas, T.R.

    1998-12-01

    A computer-implemented method identifies a script used to create a document. A set of training documents for each script to be identified is scanned into the computer to store a series of exemplary images representing each script. Pixels forming the exemplary images are electronically processed to define a set of textual symbols corresponding to the exemplary images. Each textual symbol is assigned to a cluster of textual symbols that most closely represents the textual symbol. The cluster of textual symbols is processed to form a representative electronic template for each cluster. A document having a script to be identified is scanned into the computer to form one or more document images representing the script to be identified. Pixels forming the document images are electronically processed to define a set of document textual symbols corresponding to the document images. The set of document textual symbols is compared to the electronic templates to identify the script. 17 figs.

  13. Density-based cluster algorithms for the identification of core sets

    Science.gov (United States)

    Lemke, Oliver; Keller, Bettina G.

    2016-10-01

    The core-set approach is a discretization method for Markov state models of complex molecular dynamics. Core sets are disjoint metastable regions in the conformational space, which need to be known prior to the construction of the core-set model. We propose to use density-based cluster algorithms to identify the cores. We compare three different density-based cluster algorithms: the CNN, the DBSCAN, and the Jarvis-Patrick algorithm. While the core-set models based on the CNN and DBSCAN clustering are well-converged, constructing core-set models based on the Jarvis-Patrick clustering cannot be recommended. In a well-converged core-set model, the number of core sets is up to an order of magnitude smaller than the number of states in a conventional Markov state model with comparable approximation error. Moreover, using the density-based clustering one can extend the core-set method to systems which are not strongly metastable. This is important for the practical application of the core-set method because most biologically interesting systems are only marginally metastable. The key point is to perform a hierarchical density-based clustering while monitoring the structure of the metric matrix which appears in the core-set method. We test this approach on a molecular-dynamics simulation of a highly flexible 14-residue peptide. The resulting core-set models have a high spatial resolution and can distinguish between conformationally similar yet chemically different structures, such as register-shifted hairpin structures.

  14. Research on retailer data clustering algorithm based on Spark

    Science.gov (United States)

    Huang, Qiuman; Zhou, Feng

    2017-03-01

    Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.

  15. Cluster Ensemble Based on Spectral Clustering%基于谱聚类的聚类集成算法

    Institute of Scientific and Technical Information of China (English)

    周林; 平西建; 徐森; 张涛

    2012-01-01

    Spectral clustering has become increasingly popular in recent years. It can deal with arbitrary distribution dataset, however, it is sensitive to the scaling parameter. Cluster ensemble based on spectral clustering is proposed which utilizes the good robustness and generalization ability of cluster ensemble. Multiform clustering components are generated by exploiting the property of spectral clustering, and the connected triple algorithm which can expand the similarity information among data is used to compute the affinity matrix, then the affinity matrix is used by spectral clustering algorithm to produce ensemble results. In order to make the algorithm extensible to large scale applications, only the similarity among the rand sampling data and the similarity between the random sampling data and the rest data are computed by adopting the Nystrom sampling method. The proposed algorithm makes full use of the excellent performance of spectral clustering as well as avoids the selection of the accurate parameter in spectral clustering. Experiments show that compared with other common cluster ensemble techniques, the proposed algorithm is more excellent and efficient, and that it can provide a good way to solve data clustering and image segmentation problem.%谱聚类是近年来出现的一类性能优越的聚类算法,能对任意形状的数据进行聚类,但算法对尺度参数比较敏感,利用聚类集成良好的鲁棒性和泛化能力,本文提出了基于谱聚类的聚类集成算法.该算法首先利用谱聚类算法的内在特性构造多样性的聚类成员;然后,采用连接三元组算法计算相似度矩阵,扩充了数据点之间的相似性信息;最后,对相似度矩阵使用谱聚类算法得到最终的集成结果.为了使算法能扩展到大规模应用,利用Nystr(o)m采样算法只计算随机采样数据点之间以及随机采样数据点与剩余数据点之间的相似度矩阵,从而有效降低了算法的计算复杂度.

  16. Problem decomposition by mutual information and force-based clustering

    Science.gov (United States)

    Otero, Richard Edward

    The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an

  17. The Cluster Variation Method: A Primer for Neuroscientists

    Directory of Open Access Journals (Sweden)

    Alianna J. Maren

    2016-09-01

    Full Text Available Effective Brain–Computer Interfaces (BCIs require that the time-varying activation patterns of 2-D neural ensembles be modelled. The cluster variation method (CVM offers a means for the characterization of 2-D local pattern distributions. This paper provides neuroscientists and BCI researchers with a CVM tutorial that will help them to understand how the CVM statistical thermodynamics formulation can model 2-D pattern distributions expressing structural and functional dynamics in the brain. The premise is that local-in-time free energy minimization works alongside neural connectivity adaptation, supporting the development and stabilization of consistent stimulus-specific responsive activation patterns. The equilibrium distribution of local patterns, or configuration variables, is defined in terms of a single interaction enthalpy parameter (h for the case of an equiprobable distribution of bistate (neural/neural ensemble units. Thus, either one enthalpy parameter (or two, for the case of non-equiprobable distribution yields equilibrium configuration variable values. Modeling 2-D neural activation distribution patterns with the representational layer of a computational engine, we can thus correlate variational free energy minimization with specific configuration variable distributions. The CVM triplet configuration variables also map well to the notion of a M = 3 functional motif. This paper addresses the special case of an equiprobable unit distribution, for which an analytic solution can be found.

  18. The Evaluation of Lane-Changing Behavior in Urban Traffic Stream with Fuzzy Clustering Method

    Directory of Open Access Journals (Sweden)

    Ali Abdi

    2012-11-01

    Full Text Available We present a method for The Evaluation of Lane-Changing Behavior in Urban Traffic Stream with Fuzzy Clustering Method. The trends for drivers Lane-Changing with regard to remarkable effects in traffic are regarded as a major variable in traffic engineering. As a result, various algorithms have presented most models of Lane-Changing developed by means of lane information and the manner of vehicle movement mainly obtained from images process not much attention is given to the characteristics of driver. Lane change divided into two parts the first one are compulsory lane including lane change to turn left or turn right. The second type of change is optional and lane change to improve driving condition. A low speed car is a good example, in this study, through focused group discussion method, drivers information can be obtained so that driver’s personality traits are taken into consideration. Then drivers are divided into four groups by means of Algorithm clusters. The four Algorithms suggest that phase typed cluster is a more suitable method for drivers classification based on Lane-Changing. Through notarization of different type of scenarios of lane change in Iran following results released. The percentage of drivers for each group is 17/5, 35, 20 and 27/ %, respectively.

  19. Management of Energy Consumption on Cluster Based Routing Protocol for MANET

    Science.gov (United States)

    Hosseini-Seno, Seyed-Amin; Wan, Tat-Chee; Budiarto, Rahmat; Yamada, Masashi

    The usage of light-weight mobile devices is increasing rapidly, leading to demand for more telecommunication services. Consequently, mobile ad hoc networks and their applications have become feasible with the proliferation of light-weight mobile devices. Many protocols have been developed to handle service discovery and routing in ad hoc networks. However, the majority of them did not consider one critical aspect of this type of network, which is the limited of available energy in each node. Cluster Based Routing Protocol (CBRP) is a robust/scalable routing protocol for Mobile Ad hoc Networks (MANETs) and superior to existing protocols such as Ad hoc On-demand Distance Vector (AODV) in terms of throughput and overhead. Therefore, based on this strength, methods to increase the efficiency of energy usage are incorporated into CBRP in this work. In order to increase the stability (in term of life-time) of the network and to decrease the energy consumption of inter-cluster gateway nodes, an Enhanced Gateway Cluster Based Routing Protocol (EGCBRP) is proposed. Three methods have been introduced by EGCBRP as enhancements to the CBRP: improving the election of cluster Heads (CHs) in CBRP which is based on the maximum available energy level, implementing load balancing for inter-cluster traffic using multiple gateways, and implementing sleep state for gateway nodes to further save the energy. Furthermore, we propose an Energy Efficient Cluster Based Routing Protocol (EECBRP) which extends the EGCBRP sleep state concept into all idle member nodes, excluding the active nodes in all clusters. The experiment results show that the EGCBRP decreases the overall energy consumption of the gateway nodes up to 10% and the EECBRP reduces the energy consumption of the member nodes up to 60%, both of which in turn contribute to stabilizing the network.

  20. 基于熵权-离差聚类法的城市公共安全舆情评估%Public Opinion Assesment about Urban Public Security Based on Entropy Weight-deviation and Clustering Method

    Institute of Scientific and Technical Information of China (English)

    张庆民; 王海燕; 吴春梅; 吴士亮

    2012-01-01

    Taking Tianya community forum as an example, public opinion about urban public security was evaluated for promoting constructing safe and sound cities and communities. Based on community security index, the security opinion data for five big cities, Beijing, Shanghai, Tianjin, Chongqing and Guangzhou were mined from January 2010 to May 2012 by LocoySpider. The weight values of the security opinion were determined by the entropy weight method and maximizing deviations method and a multi-attribute decision model was built. Urban public security opinion was evaluated using a ward system clustering. The results show that the rankings of the five cities by opinion index of the urban public security are: Shanghai, Chongqing, Beijing, Guangzhou,and Tianjin. Obvious differences exist among the urban public security opinion indexes of the five cities.%为了提高我国平安城市和安全社区建设水平,以天涯社区论坛为例,评估我国城市公共安全舆情.基于安全社区指标体系,采用火车头采集器对北京、天津、上海、重庆和广州5个城市2010年1月-2012年5月的公共安全舆情数据进行挖掘,采用熵权法和离差最大化法确定安全舆情指标的权重值,建立城市公共安全舆情评价的多属性决策模型.利用ward系统聚类法对城市公共安全舆情进行分类评价.计算结果表明,城市公共安全舆情指数的排序依次为上海、重庆、北京、广州、天津.这5个城市的公共安全舆情指数存在明显差异.

  1. Communication style and exercise compliance in physiotherapy (CONNECT. A cluster randomized controlled trial to test a theory-based intervention to increase chronic low back pain patients’ adherence to physiotherapists’ recommendations: study rationale, design, and methods

    Directory of Open Access Journals (Sweden)

    Lonsdale Chris

    2012-06-01

    Full Text Available Abstract Background Physical activity and exercise therapy are among the accepted clinical rehabilitation guidelines and are recommended self-management strategies for chronic low back pain. However, many back pain sufferers do not adhere to their physiotherapist’s recommendations. Poor patient adherence may decrease the effectiveness of advice and home-based rehabilitation exercises. According to self-determination theory, support from health care practitioners can promote patients’ autonomous motivation and greater long-term behavioral persistence (e.g., adherence to physiotherapists’ recommendations. The aim of this trial is to assess the effect of an intervention designed to increase physiotherapists’ autonomy-supportive communication on low back pain patients’ adherence to physical activity and exercise therapy recommendations. Methods/Design This study will be a single-blinded cluster randomized controlled trial. Outpatient physiotherapy centers (N =12 in Dublin, Ireland (population = 1.25 million will be randomly assigned using a computer-generated algorithm to either the experimental or control arm. Physiotherapists in the experimental arm (two hospitals and four primary care clinics will attend eight hours of communication skills training. Training will include handouts, workbooks, video examples, role-play, and discussion designed to teach physiotherapists how to communicate in a manner that promotes autonomous patient motivation. Physiotherapists in the waitlist control arm (two hospitals and four primary care clinics will not receive this training. Participants (N = 292 with chronic low back pain will complete assessments at baseline, as well as 1 week, 4 weeks, 12 weeks, and 24 weeks after their first physiotherapy appointment. Primary outcomes will include adherence to physiotherapy recommendations, as well as low back pain, function, and well-being. Participants will be blinded to treatment allocation, as

  2. Performance Comparison of Cluster based and Threshold based Algorithms for Detection and Prevention of Cooperative Black Hole Attack in MANETs

    Directory of Open Access Journals (Sweden)

    P. S. Hiremath

    2014-11-01

    Full Text Available In mobile ad-hoc networks (MANET, the movement of the nodes may quickly change the networks topology resulting in the increase of the overhead message in topology maintenance. The nodes communicate with each other by exchanging the hello packet and constructing the neighbor list at each node. MANET is vulnerable to attacks such as black hole attack, gray hole attack, worm hole attack and sybil attack. A black hole attack makes a serious impact on routing, packet delivery ratio, throughput, and end to end delay of packets. In this paper, the performance comparison of clustering based and threshold based algorithms for detection and prevention of cooperative in MANETs is examined. In this study every node is monitored by its own cluster head (CH, while server (SV monitors the entire network by channel overhearing method. Server computes the trust value based on sent and receive count of packets of the receiver node. It is implemented using AODV routing protocol in the NS2 simulations. The results are obtained by comparing the performance of clustering based and threshold based methods by varying the concentration of black hole nodes and are analyzed in terms of throughput, packet delivery ratio. The results demonstrate that the threshold based method outperforms the clustering based method in terms of throughput, packet delivery ratio and end to end delay.

  3. Comparisons of Graph-structure Clustering Methods for Gene Expression Data

    Institute of Scientific and Technical Information of China (English)

    Zhuo FANG; Lei LIU; Jiong YANG; Qing-Ming LUO; Yi-Xue LI

    2006-01-01

    Although many numerical clustering algorithms have been applied to gene expression data analysis, the essential step is still biological interpretation by manual inspection. The correlation between genetic co-regulation and affiliation to a common biological process is what biologists expect. Here, we introduce some clustering algorithms that are based on graph structure constituted by biological knowledge. After applying a widely used dataset, we compared the result clusters of two of these algorithms in terms of the homogeneity of clusters and coherence of annotation and matching ratio. The results show that the clusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Cluster software, which contains the genes that are most expression correlative and most consistent with biological functions. Moreover, knowledge-guided analysis seems much more applicable than GO-Cluster in a larger dataset.

  4. Selections of data preprocessing methods and similarity metrics for gene cluster analysis

    Institute of Scientific and Technical Information of China (English)

    YANG Chunmei; WAN Baikun; GAO Xiaofeng

    2006-01-01

    Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k-means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k-means clustering and SOMs have distinct advantages over hierarchical clustering in gene clustering, and SOMs are a bit better than k-means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k-means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.

  5. Reweighted mass center based object-oriented sparse subspace clustering for hyperspectral images

    Science.gov (United States)

    Zhai, Han; Zhang, Hongyan; Zhang, Liangpei; Li, Pingxiang

    2016-10-01

    Considering the inevitable obstacles faced by the pixel-based clustering methods, such as salt-and-pepper noise, high computational complexity, and the lack of spatial information, a reweighted mass center based object-oriented sparse subspace clustering (RMC-OOSSC) algorithm for hyperspectral images (HSIs) is proposed. First, the mean-shift segmentation method is utilized to oversegment the HSI to obtain meaningful objects. Second, a distance reweighted mass center learning model is presented to extract the representative and discriminative features for each object. Third, assuming that all the objects are sampled from a union of subspaces, it is natural to apply the SSC algorithm to the HSI. Faced with the high correlation among the hyperspectral objects, a weighting scheme is adopted to ensure that the highly correlated objects are preferred in the procedure of sparse representation, to reduce the representation errors. Two widely used hyperspectral datasets were utilized to test the performance of the proposed RMC-OOSSC algorithm, obtaining high clustering accuracies (overall accuracy) of 71.98% and 89.57%, respectively. The experimental results show that the proposed method clearly improves the clustering performance with respect to the other state-of-the-art clustering methods, and it significantly reduces the computational time.

  6. Cluster-based reduced-order modelling of a mixing layer

    CERN Document Server

    Kaiser, Eurika; Cordier, Laurent; Spohn, Andreas; Segond, Marc; Abel, Markus; Daviller, Guillaume; Niven, Robert K

    2013-01-01

    We propose a novel cluster-based reduced-order modelling (CROM) strategy of unsteady flows. CROM builds on the pioneering works of Gunzburger's group in cluster analysis (Burkardt et al. 2006) and Eckhardt's group in transition matrix models (Schneider et al. 2007) and constitutes a potential alternative to POD models. This strategy processes a time-resolved sequence of flow snapshots in two steps. First, the snapshot data is clustered into a small number of representative states, called centroids, in the state space. These centroids partition the state space in complementary non-overlapping regions (centroidal Voronoi cells). Departing from the standard algorithm, the probability of the clusters are determined, and the states are sorted by transition matrix consideration. Secondly, the transitions between the states are dynamically modelled via a Markov process. Physical mechanisms are then distilled by a refined analysis of the Markov process, e.g. with the finite-time Lyapunov exponent and entropic methods...

  7. Recognition of Spontaneous Combustion in Coal Mines Based on Genetic Clustering

    Institute of Scientific and Technical Information of China (English)

    2006-01-01

    Spontaneous combustion is one of the greatest disasters in coal mines. Early recognition is important because it may be a potential inducement for other coalmine accidents. However, early recognition is difficult because of the complexity of different coal mines. Fuzzy clustering has been proposed to incorporate the uncertainty of spontaneous combustion in coal mines and it can give a clear degree of classification of combustion. Because FCM clustering tends to become trapped in local minima, a new approach of fuzzy c-means clustering based on a genetic algorithm is therefore proposed. Genetic algorithm is capable of locating optimal or near optimal solutions to difficult problems. It can be applied in many fields without first obtaining detailed knowledge about correlation. It is helpful in improving the effectiveness of fuzzy clustering in detecting spontaneous combustion. The effectiveness of the method is demonstrated by means of an experiment.

  8. Approximate K-Nearest Neighbour Based Spatial Clustering Using K-D Tree

    Directory of Open Access Journals (Sweden)

    Mohammed Otair

    2013-03-01

    Full Text Available Different spatial objects that vary in their characteristics, such as molecular biology and geography, arepresented in spatial areas. Methods to organize, manage, and maintain those objects in a structuredmanner are required. Data mining raised different techniques to overcome these requirements. There aremany major tasks of data mining, but the mostly used task is clustering. Data set within the same clustershare common features that give each cluster its characteristics. In this paper, an implementation ofApproximate kNN-based spatial clustering algorithm using the K-d tree is proposed. The majorcontribution achieved by this research is the use of the k-d tree data structure for spatial clustering, andcomparing its performance to the brute-force approach. The results of the work performed in this paperrevealed better performance using the k-d tree, compared to the traditional brute-force approach.

  9. MCBT: Multi-Hop Cluster Based Stable Backbone Trees for Data Collection and Dissemination in WSNs.

    Science.gov (United States)

    Shin, Inyoung; Kim, Moonseong; Mutka, Matt W; Choo, Hyunseung; Lee, Tae-Jin

    2009-01-01

    We propose a stable backbone tree construction algorithm using multi-hop clusters for wireless sensor networks (WSNs). The hierarchical cluster structure has advantages in data fusion and aggregation. Energy consumption can be decreased by managing nodes with cluster heads. Backbone nodes, which are responsible for performing and managing multi-hop communication, can reduce the communication overhead such as control traffic and minimize the number of active nodes. Previous backbone construction algorithms, such as Hierarchical Cluster-based Data Dissemination (HCDD) and Multicluster, Mobile, Multimedia radio network (MMM), consume energy quickly. They are designed without regard to appropriate factors such as residual energy and degree (the number of connections or edges to other nodes) of a node for WSNs. Thus, the network is quickly disconnected or has to reconstruct a backbone. We propose a distributed algorithm to create a stable backbone by selecting the nodes with higher energy or degree as the cluster heads. This increases the overall network lifetime. Moreover, the proposed method balances energy consumption by distributing the traffic load among nodes around the cluster head. In the simulation, the proposed scheme outperforms previous clustering schemes in terms of the average and the standard deviation of residual energy or degree of backbone nodes, the average residual energy of backbone nodes after disseminating the sensed data, and the network lifetime.

  10. Clustering as an EDA Method: The Case of Pedestrian Directional Flow Behavior

    Directory of Open Access Journals (Sweden)

    Ma. Regina E. Estuar

    2010-01-01

    Full Text Available Given the data of pedestrian trajectories in NTXY format, three clustering methods of K Means, Expectation Maximization (EM and Affinity Propagation were utilized as Exploratory Data Analysis to find the pattern of pedestrian directional flow behavior. The analysis begins without a prior notion regarding the structure of the pattern and it consequentially infers the structure of directional flow pattern. Significant similarities in patterns for both individual and instantaneous walking angles based on EDA method are reported and explained in case studies

  11. A Virtual Router Cluster System Based on the Separation of the Control Plane and the Data Plane

    Institute of Scientific and Technical Information of China (English)

    2012-01-01

    This paper proposes a virtual router cluster system based on the separation of the control plane and the from multiple perspectives, such as architecture, key technologies, scenarios and standardization. To some extent, cluster simplifies network topology and management, achieves automatic conFig.uration and saves the IP address of low-cost expansion method of aggregation equipment port density

  12. Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    Xia Hanyu

    2016-01-01

    Full Text Available In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors is described. After that, the clustering algorithm based on fuzzy equivalence is calculated on MATLAB 2010a. In the last, Taizhou Bridge operation management departments are classified and sorted according to the degree of risk, and the safety situation of operation departments is analyzed.

  13. Clustering-based interference management in densely deployed femtocell networks

    Directory of Open Access Journals (Sweden)

    Jingyi Dai

    2016-11-01

    Full Text Available Deploying femtocells underlaying macrocells is a promising way to improve the capacity and enhance the coverage of a cellular system. However, densely deployed femtocells in urban area also give rise to intra-tier interference and cross-tier issue that should be addressed properly in order to acquire the expected performance gain. In this paper, we propose an interference management scheme based on joint clustering and resource allocation for two-tier Orthogonal Frequency Division Multiplexing (OFDM-based femtocell networks. We formulate an optimization task with the objective of maximizing the sum throughput of the femtocell users (FUs under the consideration of intra-tier interference mitigation, while controlling the interference to the macrocell user (MU under its bearable threshold. The formulation problem is addressed by a two-stage procedure: femtocells clustering and resource allocation. First, disjoint femtocell clusters with dynamic sizes and numbers are generated to minimize intra-tier interference. Then each cluster is taken as a resource allocation unit to share all subchannels, followed by a fast algorithm to distribute power among these subchannels. Simulation results show that our proposed schemes can improve the throughput of the FUs with acceptable complexity.

  14. 纵横交叉算法与模糊聚类相结合的变压器故障诊断%Fault diagnosis method of transformer based on crisscross optimization algorithm and fuzzy clustering

    Institute of Scientific and Technical Information of China (English)

    孟安波; 卢海明; 郭壮志

    2016-01-01

    Optimized the FCM clustering by the proposed CSO ( CSO-FCM) is introduced to diagnose the fault of transformer in order to conquer the shortages of FCM clustering.The combination of dissolved gas analysis and FCM clustering is effective on improving the accuracy rate of power transformer fault diagnosis, but the result of FCM cluste-ring is unstable and easy getting stuck in a local optimum.The CSO algorithm includes horizon cross as well as verti-cal cross, whose combining can enhance the global convergent ability while the introduction of competitive mechanism drives the potential solutions approximate the global optima in an accelerating fashion without sacrificing the conver-gence speed.This novel method effectively compensates the demerits of single intelligent algorithm, which not only has the ability to dispose the unstable information of fuzzy theory, also has an advantage of global convergence of CSO. Simulation and case analysis indicate that, compared with the traditional FCM clustering, the CSO-FCM clustering can obtain high performance clustering center and effectively raise the accuracy and diagnosis speed of power transformer fault diagnosis.%针对FCM(模糊C-均值聚类)在变压器故障诊断中的不足,提出采用纵横交叉算法优化FCM ( CSO-FCM)聚类来进行故障诊断。溶解气体分析与FCM相结合,能有效提高变压器故障诊断的准确率,但FCM存在聚类结果不稳定和容易陷入局部最优等问题。而纵横交叉算法是一种基于种群的随机搜索算法,在算法中首次提出了维局部最优概念和纵横交叉双搜索思想。实验证明,相比其它主流群智能优化算法,CSO算法在解决维数灾问题和收敛精度问题方面取得了较大突破,能有效克服局部最优的问题。新诊断模型有效弥补了单一诊断法的不足,拥有全局收敛性强和处理模糊信息的能力。实例分析表明,该方法与传统FCM相比,能获得

  15. Communities recognition in the Chesapeake Bay ecosystem by dynamical clustering algorithms based on different oscillators systems

    CERN Document Server

    Pluchino, Alessandro; Latora, Vito

    2008-01-01

    We have recently introduced an efficient method for the detection and identification of modules in complex networks, based on the de-synchronization properties (dynamical clustering) of phase oscillators. In this paper we apply the dynamical clustering tecnique to the identification of communities of marine organisms living in the Chesapeake Bay food web. We show that our algorithm is able to perform a very reliable classification of the real communities existing in this ecosystem by using different kinds of dynamical oscillators. We compare also our results with those of other methods for the detection of community structures in complex networks.

  16. A comparison of clustering methods for writer identification and verification

    NARCIS (Netherlands)

    Bulacu, M.L.; Schomaker, L.R.B.

    2005-01-01

    An effective method for writer identification and verification is based on assuming that each writer acts as a stochastic generator of ink-trace fragments, or graphemes. The probability distribution of these simple shapes in a given handwriting sample is characteristic for the writer and is computed

  17. MHCcluster, a method for functional clustering of MHC molecules

    DEFF Research Database (Denmark)

    Thomsen, Martin Christen Frølund; Lundegaard, Claus; Buus, Søren;

    2013-01-01

    binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where...

  18. Pivot method for global optimization: A study of structures and phase changes in water clusters

    Science.gov (United States)

    Nigra, Pablo Fernando

    In this thesis, we have carried out a study of water clusters. The research work has been developed in two stages. In the first stage, we have investigated the properties of water clusters at zero temperature by means of global optimization. The clusters were modeled by using two well known pairwise potentials having distinct characteristics. One is the Matsuoka-Clementi-Yoshimine potential (MCY) that is an ab initio fitted function based on a rigid-molecule model, the other is the Sillinger-Rahman potential (SR) which is an empirical function based on a flexible-molecule model. The algorithm used for the global optimization of the clusters was the pivot method, which was developed in our group. The results have shown that, under certain conditions, the pivot method may yield optimized structures which are related to one another in such a way that they seem to form structural families. The structures in a family can be thought of as formed from the aggregation of single units. The particular types of structures we have found are quasi-one dimensional tubes built from stacking cyclic units such as tetramers, pentamers, and hexamers. The binding energies of these tubes form sequences that span smooth curves with clear asymptotic behavior; therefore, we have also studied the sequences applying the Bulirsch-Stoer (BST) algorithm to accelerate convergence. In the second stage of the research work, we have studied the thermodynamic properties of a typical water cluster at finite temperatures. The selected cluster was the water octamer which exhibits a definite solid-liquid phase change. The water octamer also has several low lying energy cubic structures with large energetic barriers that cause ergodicity breaking in regular Monte Carlo simulations. For that reason we have simulated the octamer using paralell tempering Monte Carlo combined with the multihistogram method. This has permited us to calculate the heat capacity from very low temperatures up to T = 230 K. We

  19. Nationwide registry-based analysis of cancer clustering detects strong familial occurrence of Kaposi sarcoma.

    Science.gov (United States)

    Kaasinen, Eevi; Aavikko, Mervi; Vahteristo, Pia; Patama, Toni; Li, Yilong; Saarinen, Silva; Kilpivaara, Outi; Pitkänen, Esa; Knekt, Paul; Laaksonen, Maarit; Artama, Miia; Lehtonen, Rainer; Aaltonen, Lauri A; Pukkala, Eero

    2013-01-01

    Many cancer predisposition syndromes are rare or have incomplete penetrance, and traditional epidemiological tools are not well suited for their detection. Here we have used an approach that employs the entire population based data in the Finnish Cancer Registry (FCR) for analyzing familial aggregation of all types of cancer, in order to find evidence for previously unrecognized cancer susceptibility conditions. We performed a systematic clustering of 878,593 patients in FCR based on family name at birth, municipality of birth, and tumor type, diagnosed between years 1952 and 2011. We also estimated the familial occurrence of the tumor types using cluster score that reflects the proportion of patients belonging to the most significant clusters compared to all patients in Finland. The clustering effort identified 25,910 birth name-municipality based clusters representing 183 different tumor types characterized by topography and morphology. We produced information about familial occurrence of hundreds of tumor types, and many of the tumor types with high cluster score represented known cancer syndromes. Unexpectedly, Kaposi sarcoma (KS) also produced a very high score (cluster score 1.91, p-value <0.0001). We verified from population records that many of the KS patients forming the clusters were indeed close relatives, and identified one family with five affected individuals in two generations and several families with two first degree relatives. Our approach is unique in enabling systematic examination of a national epidemiological database to derive evidence of aberrant familial aggregation of all tumor types, both common and rare. It allowed effortless identification of families displaying features of both known as well as potentially novel cancer predisposition conditions, including striking familial aggregation of KS. Further work with high-throughput methods should elucidate the molecular basis of the potentially novel predisposition conditions found in this

  20. Nationwide registry-based analysis of cancer clustering detects strong familial occurrence of Kaposi sarcoma.

    Directory of Open Access Journals (Sweden)

    Eevi Kaasinen

    Full Text Available Many cancer predisposition syndromes are rare or have incomplete penetrance, and traditional epidemiological tools are not well suited for their detection. Here we have used an approach that employs the entire population based data in the Finnish Cancer Registry (FCR for analyzing familial aggregation of all types of cancer, in order to find evidence for previously unrecognized cancer susceptibility conditions. We performed a systematic clustering of 878,593 patients in FCR based on family name at birth, municipality of birth, and tumor type, diagnosed between years 1952 and 2011. We also estimated the familial occurrence of the tumor types using cluster score that reflects the proportion of patients belonging to the most significant clusters compared to all patients in Finland. The clustering effort identified 25,910 birth name-municipality based clusters representing 183 different tumor types characterized by topography and morphology. We produced information about familial occurrence of hundreds of tumor types, and many of the tumor types with high cluster score represented known cancer syndromes. Unexpectedly, Kaposi sarcoma (KS also produced a very high score (cluster score 1.91, p-value <0.0001. We verified from population records that many of the KS patients forming the clusters were indeed close relatives, and identified one family with five affected individuals in two generations and several families with two first degree relatives. Our approach is unique in enabling systematic examination of a national epidemiological database to derive evidence of aberrant familial aggregation of all tumor types, both common and rare. It allowed effortless identification of families displaying features of both known as well as potentially novel cancer predisposition conditions, including striking familial aggregation of KS. Further work with high-throughput methods should elucidate the molecular basis of the potentially novel predisposition conditions

  1. Earthquakes clustering based on the magnitude and the depths in Molluca Province

    Energy Technology Data Exchange (ETDEWEB)

    Wattimanela, H. J., E-mail: hwattimaela@yahoo.com [Pattimura University, Ambon (Indonesia); Institute of Technology Bandung, Bandung (Indonesia); Pasaribu, U. S.; Indratno, S. W.; Puspito, A. N. T. [Institute of Technology Bandung, Bandung (Indonesia)

    2015-12-22

    In this paper, we present a model to classify the earthquakes occurred in Molluca Province. We use K-Means clustering method to classify the earthquake based on the magnitude and the depth of the earthquake. The result can be used for disaster mitigation and for designing evacuation route in Molluca Province.

  2. 3D BUILDING MODELS SEGMENTATION BASED ON K-MEANS++ CLUSTER ANALYSIS

    Directory of Open Access Journals (Sweden)

    C. Zhang

    2016-10-01

    Full Text Available 3D mesh model segmentation is drawing increasing attentions from digital geometry processing field in recent years. The original 3D mesh model need to be divided into separate meaningful parts or surface patches based on certain standards to support reconstruction, compressing, texture mapping, model retrieval and etc. Therefore, segmentation is a key problem for 3D mesh model segmentation. In this paper, we propose a method to segment Collada (a type of mesh model 3D building models into meaningful parts using cluster analysis. Common clustering methods segment 3D mesh models by K-means, whose performance heavily depends on randomized initial seed points (i.e., centroid and different randomized centroid can get quite different results. Therefore, we improved the existing method and used K-means++ clustering algorithm to solve this problem. Our experiments show that K-means++ improves both the speed and the accuracy of K-means, and achieve good and meaningful results.

  3. Community detection in complex networks using density-based clustering algorithm and manifold learning

    Science.gov (United States)

    You, Tao; Cheng, Hui-Min; Ning, Yi-Zi; Shia, Ben-Chang; Zhang, Zhong-Yuan

    2016-12-01

    Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers in the network. To solve the problem, a new community detection method (named IsoFdp) is proposed in this paper. First, we use IsoMap technique to map the network data into a low dimensional manifold which can reveal diverse pair-wised similarity. Then Fdp is applied to detect the communities in the network. An improved partition density function is proposed to select the proper number of communities automatically. We test our method on both synthetic and real-world networks, and the results demonstrate the effectiveness of our algorithm over the state-of-the-art methods.

  4. Clustering of User Behaviour based on Web Log data using Improved K-Means Clustering Algorithm

    Directory of Open Access Journals (Sweden)

    S.Padmaja

    2016-02-01

    Full Text Available The proposed work does an improved K-means clustering algorithm for identifying internet user behaviour. Web data analysis includes the transformation and interpretation of web log data find out the information, patterns and knowledge discovery. The efficiency of the algorithm is analyzed by considering certain parameters. The parameters are date, time, S_id, CS_method, C_IP, User_agent and time taken. The research done by using more than 2 years of real data set collected from two different group of institutions web server .this dataset provides a better analysis of Log data to identify internet user behaviour.

  5. Analysis of cost data in a cluster-randomized, controlled trial: comparison of methods

    DEFF Research Database (Denmark)

    Sokolowski, Ineta; Ørnbøl, Eva; Rosendal, Marianne;

    in clusters of general practices.   There have been suggestions to apply different methods, e.g., the non-parametric bootstrap, to highly skewed data from pragmatic randomized trials without clusters, but there is very little information about how to analyse skewed data from cluster-randomized trials. Many...... studies have used non-valid analysis of skewed data. We propose two different methods to compare mean cost in two groups. Firstly, we use a non-parametric bootstrap method where the re-sampling takes place on two levels in order to take into account the cluster effect. Secondly, we proceed with a log...

  6. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays.

    Science.gov (United States)

    Naeni, Leila M; Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays.

  7. A Novel Clustering Methodology Based on Modularity Optimisation for Detecting Authorship Affinities in Shakespearean Era Plays

    Science.gov (United States)

    Craig, Hugh; Berretta, Regina; Moscato, Pablo

    2016-01-01

    In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays. PMID:27571416

  8. Model-based Clustering of Categorical Time Series with Multinomial Logit Classification

    Science.gov (United States)

    Frühwirth-Schnatter, Sylvia; Pamminger, Christoph; Winter-Ebmer, Rudolf; Weber, Andrea

    2010-09-01

    A common problem in many areas of applied statistics is to identify groups of similar time series in a panel of time series. However, distance-based clustering methods cannot easily be extended to time series data, where an appropriate distance-measure is rather difficult to define, particularly for discrete-valued time series. Markov chain clustering, proposed by Pamminger and Frühwirth-Schnatter [6], is an approach for clustering discrete-valued time series obtained by observing a categorical variable with several states. This model-based clustering method is based on finite mixtures of first-order time-homogeneous Markov chain models. In order to further explain group membership we present an extension to the approach of Pamminger and Frühwirth-Schnatter [6] by formulating a probabilistic model for the latent group indicators within the Bayesian classification rule by using a multinomial logit model. The parameters are estimated for a fixed number of clusters within a Bayesian framework using an Markov chain Monte Carlo (MCMC) sampling scheme representing a (full) Gibbs-type sampler which involves only draws from standard distributions. Finally, an application to a panel of Austrian wage mobility data is presented which leads to an interesting segmentation of the Austrian labour market.

  9. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks.

    Science.gov (United States)

    Jayakumari, R Beulah; Senthilkumar, V Jawahar

    2015-01-01

    Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN) is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC) protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol.

  10. Priority Based Congestion Control Dynamic Clustering Protocol in Mobile Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    R. Beulah Jayakumari

    2015-01-01

    Full Text Available Wireless sensor network is widely used to monitor natural phenomena because natural disaster has globally increased which causes significant loss of life, economic setback, and social development. Saving energy in a wireless sensor network (WSN is a critical factor to be considered. The sensor nodes are deployed to sense, compute, and communicate alerts in a WSN which are used to prevent natural hazards. Generally communication consumes more energy than sensing and computing; hence cluster based protocol is preferred. Even with clustering, multiclass traffic creates congested hotspots in the cluster, thereby causing packet loss and delay. In order to conserve energy and to avoid congestion during multiclass traffic a novel Priority Based Congestion Control Dynamic Clustering (PCCDC protocol is developed. PCCDC is designed with mobile nodes which are organized dynamically into clusters to provide complete coverage and connectivity. PCCDC computes congestion at intra- and intercluster level using linear and binary feedback method. Each mobile node within the cluster has an appropriate queue model for scheduling prioritized packet during congestion without drop or delay. Simulation results have proven that packet drop, control overhead, and end-to-end delay are much lower in PCCDC which in turn significantly increases packet delivery ratio, network lifetime, and residual energy when compared with PASCC protocol.

  11. A Dirichlet Process Mixture Based Name Origin Clustering and Alignment Model for Transliteration

    Directory of Open Access Journals (Sweden)

    Chunyue Zhang

    2015-01-01

    Full Text Available In machine transliteration, it is common that the transliterated names in the target language come from multiple language origins. A conventional maximum likelihood based single model can not deal with this issue very well and often suffers from overfitting. In this paper, we exploit a coupled Dirichlet process mixture model (cDPMM to address overfitting and names multiorigin cluster issues simultaneously in the transliteration sequence alignment step over the name pairs. After the alignment step, the cDPMM clusters name pairs into many groups according to their origin information automatically. In the decoding step, in order to use the learned origin information sufficiently, we use a cluster combination method (CCM to build clustering-specific transliteration models by combining small clusters into large ones based on the perplexities of name language and transliteration model, which makes sure each origin cluster has enough data for training a transliteration model. On the three different Western-Chinese multiorigin names corpora, the cDPMM outperforms two state-of-the-art baseline models in terms of both the top-1 accuracy and mean F-score, and furthermore the CCM significantly improves the cDPMM.

  12. Application of the Clustering Method in Molecular Dynamics Simulation of the Diffusion Coefficient

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    Using molecular dynamics (MD) simulation, the diffusion of oxygen, methane, ammonia and carbon dioxide in water was simulated in the canonical NVT ensemble, and the diffusion coefficient was analyzed by the clustering method. By comparing to the conventional method (using the Einstein model) and the differentiation-interval variation method, we found that the results obtained by the clustering method used in this study are more close to the experimental values. This method proved to be more reasonable than the other two methods.

  13. Survey of Clustering based Financial Fraud Detection Research

    Directory of Open Access Journals (Sweden)

    Andrei Sorin SABAU

    2012-01-01

    Full Text Available Given the current global economic context, increasing efforts are being made to both prevent and detect fraud. This is a natural response to the ascendant trend in fraud activities recorded in the last couple of years, with a 13% increase only in 2011. Due to ever increasing volumes of data needed to be analyzed, data mining methods and techniques are being used more and more often. One domain data mining can excel at, suspicious transaction monitoring, has emerged for the first time as the most effective fraud detection method in 2011. Out of the available data mining techniques, clustering has proven itself a constant applied solution for detecting fraud. This paper surveys clustering techniques used in fraud detection over the last ten years, shortly reviewing each one.

  14. Classification of excessive domestic water consumption using Fuzzy Clustering Method

    Science.gov (United States)

    Zairi Zaidi, A.; Rasmani, Khairul A.

    2016-08-01

    Demand for clean and treated water is increasing all over the world. Therefore it is crucial to conserve water for better use and to avoid unnecessary, excessive consumption or wastage of this natural resource. Classification of excessive domestic water consumption is a difficult task due to the complexity in determining the amount of water usage per activity, especially as the data is known to vary between individuals. In this study, classification of excessive domestic water consumption is carried out using a well-known Fuzzy C-Means (FCM) clustering algorithm. Consumer data containing information on daily, weekly and monthly domestic water usage was employed for the purpose of classification. Using the same dataset, the result produced by the FCM clustering algorithm is compared with the result obtained from a statistical control chart. The finding of this study demonstrates the potential use of the FCM clustering algorithm for the classification of domestic consumer water consumption data.

  15. Energy Aware Cluster Based Routing Scheme For Wireless Sensor Network

    Directory of Open Access Journals (Sweden)

    Roy Sohini

    2015-09-01

    Full Text Available Wireless Sensor Network (WSN has emerged as an important supplement to the modern wireless communication systems due to its wide range of applications. The recent researches are facing the various challenges of the sensor network more gracefully. However, energy efficiency has still remained a matter of concern for the researches. Meeting the countless security needs, timely data delivery and taking a quick action, efficient route selection and multi-path routing etc. can only be achieved at the cost of energy. Hierarchical routing is more useful in this regard. The proposed algorithm Energy Aware Cluster Based Routing Scheme (EACBRS aims at conserving energy with the help of hierarchical routing by calculating the optimum number of cluster heads for the network, selecting energy-efficient route to the sink and by offering congestion control. Simulation results prove that EACBRS performs better than existing hierarchical routing algorithms like Distributed Energy-Efficient Clustering (DEEC algorithm for heterogeneous wireless sensor networks and Energy Efficient Heterogeneous Clustered scheme for Wireless Sensor Network (EEHC.

  16. Enhancing Text Clustering Using Concept-based Mining Model

    Directory of Open Access Journals (Sweden)

    Lincy Liptha R.

    2012-03-01

    Full Text Available Text Mining techniques are mostly based on statistical analysis of a word or phrase. The statistical analysis of a term frequency captures the importance of the term without a document only. But two terms can have the same frequency in the same document. But the meaning that one term contributes might be more appropriate than the meaning contributed by the other term. Hence, the terms that capture the semantics of the text should be given more importance. Here, a new concept-based mining is introduced. It analyses the terms based on the sentence, document and corpus level. The model consists of sentence-based concept analysis which calculates the conceptual term frequency (ctf, document-based concept analysis which finds the term frequency (tf, corpus-based concept analysis which determines the document frequency (dfand concept-based similarity measure. The process of calculating ctf, tf, df, measures in a corpus is attained by the proposed algorithm which is called Concept-Based Analysis Algorithm. By doing so we cluster the web documents in an efficient way and the quality of the clusters achieved by this model significantly surpasses the traditional single-term-base approaches.

  17. 基于层次分析法与聚类方法的商业企业零售户分类研究%Research on Commercial Enterprise′s Retailers Segmentation based on AHP and Clustering Methods

    Institute of Scientific and Technical Information of China (English)

    石冠峰; 韩宏稳; 肖静

    2014-01-01

    本文针对商业企业和零售户的特征构建了零售户价值评价指标体系,并运用层次分析法确定各指标的权重,从当前价值和潜在价值方面分析了零售户价值;通过选择聚类方法对收集到的700名零售户数据进行聚类分析,依据判别得出的结果将零售户进行分类,旨在针对每类客户群的特点提出相应的营销策略建议。%The paper built a retail customer value evaluation index system according to commercial enterprise and retail -ers′characteristics , and applied current value and potential value to judge the value of retailers by their features , by ap-plying AHP ( analytic hierarchy process ) to determine the weight of each index;by clustering analysis of 700 retail cli-ents data with clustering method , this paper classified the retailer according to results , aiming at proposing corresponding marketing strategy according to each kind of customer group′s characteristics .

  18. Yeast homologous recombination-based promoter engineering for the activation of silent natural product biosynthetic gene clusters.

    Science.gov (United States)

    Montiel, Daniel; Kang, Hahk-Soo; Chang, Fang-Yuan; Charlop-Powers, Zachary; Brady, Sean F

    2015-07-21

    Large-scale sequencing of prokaryotic (meta)genomic DNA suggests that most bacterial natural product gene clusters are not expressed under common laboratory culture conditions. Silent gene clusters represent a promising resource for natural product discovery and the development of a new generation of therapeutics. Unfortunately, the characterization of molecules encoded by these clusters is hampered owing to our inability to express these gene clusters in the laboratory. To address this bottleneck, we have developed a promoter-engineering platform to transcriptionally activate silent gene clusters in a model heterologous host. Our approach uses yeast homologous recombination, an auxotrophy complementation-based yeast selection system and sequence orthogonal promoter cassettes to exchange all native promoters in silent gene clusters with constitutively active promoters. As part of this platform, we constructed and validated a set of bidirectional promoter cassettes consisting of orthogonal promoter sequences, Streptomyces ribosome binding sites, and yeast selectable marker genes. Using these tools we demonstrate the ability to simultaneously insert multiple promoter cassettes into a gene cluster, thereby expediting the reengineering process. We apply this method to model active and silent gene clusters (rebeccamycin and tetarimycin) and to the silent, cryptic pseudogene-containing, environmental DNA-derived Lzr gene cluster. Complete promoter refactoring and targeted gene exchange in this "dead" cluster led to the discovery of potent indolotryptoline antiproliferative agents, lazarimides A and B. This potentially scalable and cost-effective promoter reengineering platform should streamline the discovery of natural products from silent natural product biosynthetic gene clusters.

  19. Topic Modeling Based Image Clustering by Events in Social Media

    OpenAIRE