Kernel method-based fuzzy clustering algorithm
Institute of Scientific and Technical Information of China (English)
Wu Zhongdong; Gao Xinbo; Xie Weixin; Yu Jianping
2005-01-01
The fuzzy C-means clustering algorithm(FCM) to the fuzzy kernel C-means clustering algorithm(FKCM) to effectively perform cluster analysis on the diversiform structures are extended, such as non-hyperspherical data, data with noise, data with mixture of heterogeneous cluster prototypes, asymmetric data, etc. Based on the Mercer kernel, FKCM clustering algorithm is derived from FCM algorithm united with kernel method. The results of experiments with the synthetic and real data show that the FKCM clustering algorithm is universality and can effectively unsupervised analyze datasets with variform structures in contrast to FCM algorithm. It is can be imagined that kernel-based clustering algorithm is one of important research direction of fuzzy clustering analysis.
A Clustering Method Based on the Maximum Entropy Principle
Directory of Open Access Journals (Sweden)
Edwin Aldana-Bobadilla
2015-01-01
Full Text Available Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are “similar” to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method’s effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method.
Agent-based method for distributed clustering of textual information
Potok, Thomas E. [Oak Ridge, TN; Reed, Joel W. [Knoxville, TN; Elmore, Mark T. [Oak Ridge, TN; Treadwell, Jim N. [Louisville, TN
2010-09-28
A computer method and system for storing, retrieving and displaying information has a multiplexing agent (20) that calculates a new document vector (25) for a new document (21) to be added to the system and transmits the new document vector (25) to master cluster agents (22) and cluster agents (23) for evaluation. These agents (22, 23) perform the evaluation and return values upstream to the multiplexing agent (20) based on the similarity of the document to documents stored under their control. The multiplexing agent (20) then sends the document (21) and the document vector (25) to the master cluster agent (22), which then forwards it to a cluster agent (23) or creates a new cluster agent (23) to manage the document (21). The system also searches for stored documents according to a search query having at least one term and identifying the documents found in the search, and displays the documents in a clustering display (80) of similarity so as to indicate similarity of the documents to each other.
Super pixel density based clustering automatic image classification method
Xu, Mingxing; Zhang, Chuan; Zhang, Tianxu
2015-12-01
The image classification is an important means of image segmentation and data mining, how to achieve rapid automated image classification has been the focus of research. In this paper, based on the super pixel density of cluster centers algorithm for automatic image classification and identify outlier. The use of the image pixel location coordinates and gray value computing density and distance, to achieve automatic image classification and outlier extraction. Due to the increased pixel dramatically increase the computational complexity, consider the method of ultra-pixel image preprocessing, divided into a small number of super-pixel sub-blocks after the density and distance calculations, while the design of a normalized density and distance discrimination law, to achieve automatic classification and clustering center selection, whereby the image automatically classify and identify outlier. After a lot of experiments, our method does not require human intervention, can automatically categorize images computing speed than the density clustering algorithm, the image can be effectively automated classification and outlier extraction.
Density-based clustering method in the moving object database
Institute of Scientific and Technical Information of China (English)
ZHOU Xing; XIANG Shu; GE Jun-wei; LIU Zhao-hong; BAE Hae-young
2004-01-01
With the rapid advance of wireless communication, tracking the positions of the moving objects is becoming increasingly feasible and necessary. Because a large number of people use mobile phones, we must handle a large moving object database as well as the following problems. How can we provide the customers with high quality service, that means, how can we deal with so many enquiries within as less time as possible? Because of the large number of data, the gap between CPU speed and the size of main memory has increasing considerably. One way to reduce the time to handle enquiries is to reduce the I/O number between the buffer and the secondary storage. An effective clustering of the objects can minimize the I/O-cost between them. In this paper, according to the characteristic of the moving object database, we analyze the objects in buffer, according to their mappings in the two-dimension coordinate, and then develop a density-based clustering method to effectively reorganize the clusters. This new mechanism leads to the less cost of the I/O operation and the more efficient response to enquiries.
Urban Fire Risk Clustering Method Based on Fire Statistics
Institute of Scientific and Technical Information of China (English)
WU Lizhi; REN Aizhu
2008-01-01
Fire statistics and fire analysis have become important ways for us to understand the law of fire,prevent the occurrence of fire, and improve the ability to control fire. According to existing fire statistics, the weighted fire risk calculating method characterized by the number of fire occurrence, direct economic losses,and fire casualties was put forward. On the basis of this method, meanwhile having improved K-mean clus-tering arithmetic, this paper established fire dsk K-mean clustering model, which could better resolve the automatic classifying problems towards fire risk. Fire risk cluster should be classified by the absolute dis-tance of the target instead of the relative distance in the traditional cluster arithmetic. Finally, for applying the established model, this paper carded out fire risk clustering on fire statistics from January 2000 to December 2004 of Shenyang in China. This research would provide technical support for urban fire management.
Color Image Segmentation Method Based on Improved Spectral Clustering Algorithm
Directory of Open Access Journals (Sweden)
Dong Qin
2014-08-01
Full Text Available Contraposing to the features of image data with high sparsity of and the problems on determination of clustering numbers, we try to put forward an color image segmentation algorithm, combined with semi-supervised machine learning technology and spectral graph theory. By the research of related theories and methods of spectral clustering algorithms, we introduce information entropy conception to design a method which can automatically optimize the scale parameter value. So it avoids the unstability in clustering result of the scale parameter input manually. In addition, we try to excavate available priori information existing in large number of non-generic data and apply semi-supervised algorithm to improve the clustering performance for rare class. We also use added tag data to compute similar matrix and perform clustering through FKCM algorithms. By the simulation of standard dataset and image segmentation, the experiments demonstrate our algorithm has overcome the defects of traditional spectral clustering methods, which are sensitive to outliers and easy to fall into local optimum, and also poor in the convergence rate
Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
Raymond, J.W.; Blankley, C.J.; Willett, P.
2003-01-01
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size...
Directory of Open Access Journals (Sweden)
Ichiro IWASAKI
2010-06-01
Full Text Available Michael Porter’s concept of competitive advantages emphasizes the importance of regional cooperation of various actors in order to gain competitiveness on globalized markets. Foreign investors may play an important role in forming such cooperation networks. Their local suppliers tend to concentrate regionally. They can form, together with local institutions of education, research, financial and other services, development agencies, the nucleus of cooperative clusters. This paper deals with the relationship between supplier networks and clusters. Two main issues are discussed in more detail: the interest of multinational companies in entering regional clusters and the spillover effects that may stem from their participation. After the discussion on the theoretical background, the paper introduces a relatively new analytical method: “cluster mapping” - a method that can spot regional hot spots of specific economic activities with cluster building potential. Experience with the method was gathered in the US and in the European Union. After the discussion on the existing empirical evidence, the authors introduce their own cluster mapping results, which they obtained by using a refined version of the original methodology.
Image Clustering Method Based on Density Maps Derived from Self-Organizing Mapping: SOM
Directory of Open Access Journals (Sweden)
Kohei Arai
2012-07-01
Full Text Available A new method for image clustering with density maps derived from Self-Organizing Maps (SOM is proposed together with a clarification of learning processes during a construction of clusters. It is found that the proposed SOM based image clustering method shows much better clustered result for both simulation and real satellite imagery data. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. It is also found that the separability among clusters of the proposed method is 16% longer than the existing k-mean clustering. In accordance with the experimental results with Landsat-5 TM image, it takes more than 20000 of iteration for convergence of the SOM learning processes.
Clustering Based Classification in Data Mining Method Recommendation
Czech Academy of Sciences Publication Activity Database
Kazík, O.; Pešková, K.; Šmíd, J.; Neruda, Roman
Vol. 2. Los Alamitos: IEEE Computer Society, 2013 - (Wani, M.; Tecuci, G.; Boicu, M.; Kubát, M.; Khoshgoftaar, T.; Seliya, N.), s. 356-361 ISBN 978-0-7695-5144-9. [ICMLA 2013. International Conference on Machine Learning and Applications /12./. Miami (US), 04.12.2013-07.12.2013] R&D Projects: GA ČR GAP202/11/1368; GA MŠk(CZ) LD13002 Grant ostatní: GA UK(CZ) 29612; SVV(CZ) 265314 Institutional support: RVO:67985807 Keywords : metalearning * clustering * data mining * method recommendation Subject RIV: IN - Informatics, Computer Science
AN ADAPTIVE GRID-BASED METHOD FOR CLUSTERING MULTIDIMENSIONAL ONLINE DATA STREAMS
Directory of Open Access Journals (Sweden)
Toktam Dehghani
2012-10-01
Full Text Available Clustering is an important task in mining the evolving data streams. A lot of data streams are high dimensional in nature. Clustering in the high dimensional data space is a complex problem, which is inherently more complex for data streams. Most data stream clustering methods are not capable of dealing with high dimensional data streams; therefore they sacrifice the accuracy of clusters. In order to solve this problem we proposed an adaptive grid -based clustering method. Our focus is on providing up-to-date arbitrary shaped clusters along with improving the processing time and bounding the amount of the memory u sage. In our method (B+C tree, a structure called “B+cell tree” is used to keep the recent information of a data stream. In order to reduce the complexity of the clustering, a structure called “cluster tree” is proposed to maintain multi dimensional clusters. A Cluster tree yields high quality clusters by keeping the boundaries of clusters in a semi -optimal way. Clustertree captures the dynamic changes of data streams and adjusts the clusters. Our performance study over a number of real and synthetic data streams demonstrates the scalability of algorithm on the number of dimensions and data without sacrificing the accuracy of identified clusters.
Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering
He, Zengyou
2006-01-01
The k-modes algorithm has become a popular technique in solving categorical data clustering problems in different application domains. However, the algorithm requires random selection of initial points for the clusters. Different initial points often lead to considerable distinct clustering results. In this paper we present an experimental study on applying a farthest-point heuristic based initialization method to k-modes clustering to improve its performance. Experiments show that new initia...
Šubelj, Lovro; Waltman, Ludo
2015-01-01
Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between di...
Smooth Splicing: A Robust SNN-Based Method for Clustering High-Dimensional Data
Directory of Open Access Journals (Sweden)
JingDong Tan
2013-01-01
Full Text Available Sharing nearest neighbor (SNN is a novel metric measure of similarity, and it can conquer two hardships: the low similarities between samples and the different densities of classes. At present, there are two popular SNN similarity based clustering methods: JP clustering and SNN density based clustering. Their clustering results highly rely on the weighting value of the single edge, and thus they are very vulnerable. Motivated by the idea of smooth splicing in computing geometry, the authors design a novel SNN similarity based clustering algorithm within the structure of graph theory. Since it inherits complementary intensity-smoothness principle, its generalizing ability surpasses those of the previously mentioned two methods. The experiments on text datasets show its effectiveness.
Clustering with Spectral Methods
Gaertler, Marco
2002-01-01
Grouping and sorting are problems with a great tradition in the history of mankind. Clustering and cluster analysis is a small aspect in the wide spectrum. But these topics have applications in most scientific disciplines. Graph clustering is again a little fragment in the clustering area. Nevertheless it has the potential for new pioneering and innovative methods. One such method is the Markov Clustering presented by van Dongen in 'Graph Clustering by Flow Simulation'. We investigated the qu...
Šubelj, Lovro; van Eck, Nees Jan; Waltman, Ludo
2016-01-01
Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community. PMID:27124610
A clustering based method to evaluate soil corrosivity for pipeline external integrity management
International Nuclear Information System (INIS)
One important category of transportation infrastructure is underground pipelines. Corrosion of these buried pipeline systems may cause pipeline failures with the attendant hazards of property loss and fatalities. Therefore, developing the capability to estimate the soil corrosivity is important for designing and preserving materials and for risk assessment. The deterioration rate of metal is highly influenced by the physicochemical characteristics of a material and the environment of its surroundings. In this study, the field data obtained from the southeast region of Mexico was examined using various data mining techniques to determine the usefulness of these techniques for clustering soil corrosivity level. Specifically, the soil was classified into different corrosivity level clusters by k-means and Gaussian mixture model (GMM). In terms of physical space, GMM shows better separability; therefore, the distributions of the material loss of the buried petroleum pipeline walls were estimated via the empirical density within GMM clusters. The soil corrosivity levels of the clusters were determined based on the medians of metal loss. The proposed clustering method was demonstrated to be capable of classifying the soil into different levels of corrosivity severity. - Highlights: • The clustering approach is applied to the data extracted from a real-life pipeline system. • Soil properties in the right-of-way are analyzed via clustering techniques to assess corrosivity. • GMM is selected as the preferred method for detecting the hidden pattern of in-situ data. • K–W test is performed for significant difference of corrosivity level between clusters
A method for context-based adaptive QRS clustering in real-time
Castro, Daniel; Presedo, Jesús
2014-01-01
Continuous follow-up of heart condition through long-term electrocardiogram monitoring is an invaluable tool for diagnosing some cardiac arrhythmias. In such context, providing tools for fast locating alterations of normal conduction patterns is mandatory and still remains an open issue. This work presents a real-time method for adaptive clustering QRS complexes from multilead ECG signals that provides the set of QRS morphologies that appear during an ECG recording. The method processes the QRS complexes sequentially, grouping them into a dynamic set of clusters based on the information content of the temporal context. The clusters are represented by templates which evolve over time and adapt to the QRS morphology changes. Rules to create, merge and remove clusters are defined along with techniques for noise detection in order to avoid their proliferation. To cope with beat misalignment, Derivative Dynamic Time Warping is used. The proposed method has been validated against the MIT-BIH Arrhythmia Database and...
A scale-independent clustering method with automatic variable selection based on trees
Lynch, Sarah K.
2014-01-01
Approved for public release; distribution is unlimited. Clustering is the process of putting observations into groups based on their distance, or dissimilarity, from one another. Measuring distance for continuous variables often requires scaling or monotonic transformation. Determining dissimilarity when observations have both continuous and categorical measurements can be difficult because each type of measurement must be approached differently. We introduce a new clustering method that u...
An effective trust-based recommendation method using a novel graph clustering algorithm
Moradi, Parham; Ahmadian, Sajad; Akhlaghian, Fardin
2015-10-01
Recommender systems are programs that aim to provide personalized recommendations to users for specific items (e.g. music, books) in online sharing communities or on e-commerce sites. Collaborative filtering methods are important and widely accepted types of recommender systems that generate recommendations based on the ratings of like-minded users. On the other hand, these systems confront several inherent issues such as data sparsity and cold start problems, caused by fewer ratings against the unknowns that need to be predicted. Incorporating trust information into the collaborative filtering systems is an attractive approach to resolve these problems. In this paper, we present a model-based collaborative filtering method by applying a novel graph clustering algorithm and also considering trust statements. In the proposed method first of all, the problem space is represented as a graph and then a sparsest subgraph finding algorithm is applied on the graph to find the initial cluster centers. Then, the proposed graph clustering algorithm is performed to obtain the appropriate users/items clusters. Finally, the identified clusters are used as a set of neighbors to recommend unseen items to the current active user. Experimental results based on three real-world datasets demonstrate that the proposed method outperforms several state-of-the-art recommender system methods.
A semantics-based method for clustering of Chinese web search results
Zhang, Hui; Wang, Deqing; Wang, Li; Bi, Zhuming; Chen, Yong
2014-01-01
Information explosion is a critical challenge to the development of modern information systems. In particular, when the application of an information system is over the Internet, the amount of information over the web has been increasing exponentially and rapidly. Search engines, such as Google and Baidu, are essential tools for people to find the information from the Internet. Valuable information, however, is still likely submerged in the ocean of search results from those tools. By clustering the results into different groups based on subjects automatically, a search engine with the clustering feature allows users to select most relevant results quickly. In this paper, we propose an online semantics-based method to cluster Chinese web search results. First, we employ the generalised suffix tree to extract the longest common substrings (LCSs) from search snippets. Second, we use the HowNet to calculate the similarities of the words derived from the LCSs, and extract the most representative features by constructing the vocabulary chain. Third, we construct a vector of text features and calculate snippets' semantic similarities. Finally, we improve the Chameleon algorithm to cluster snippets. Extensive experimental results have shown that the proposed algorithm has outperformed over the suffix tree clustering method and other traditional clustering methods.
A polymerization-based method to construct a plasmid containing clustered DNA damage and a mismatch.
Takahashi, Momoko; Akamatsu, Ken; Shikazono, Naoya
2016-10-01
Exposure of biological materials to ionizing radiation often induces clustered DNA damage. The mutagenicity of clustered DNA damage can be analyzed with plasmids carrying a clustered DNA damage site, in which the strand bias of a replicating plasmid (i.e., the degree to which each of the two strands of the plasmid are used as the template for replication of the plasmid) can help to clarify how clustered DNA damage enhances the mutagenic potential of comprising lesions. Placement of a mismatch near a clustered DNA damage site can help to determine the strand bias, but present plasmid-based methods do not allow insertion of a mismatch at a given site in the plasmid. Here, we describe a polymerization-based method for constructing a plasmid containing clustered DNA lesions and a mismatch. The presence of a DNA lesion and a mismatch in the plasmid was verified by enzymatic treatment and by determining the relative abundance of the progeny plasmids derived from each of the two strands of the plasmid. PMID:27449134
Cluster Evaluation of Density Based Subspace Clustering
Sembiring, Rahmat Widia
2010-01-01
Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object typically determined using a distance function, for example the Euclidean distance. In this paper SUBCLU, FIRES and INSCY methods will be applied to clustering 6x1595 dimension synthetic datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used as evaluation performance parameters. Evaluation results showed SUBCLU method requires considerable time to process subspace clustering; however, its value coverage is better. Meanwhile INSCY method is better for accuracy comparing with two other methods, altho...
Directory of Open Access Journals (Sweden)
Issam SAHMOUDI
2013-12-01
Full Text Available Document Clustering is a branch of a larger area of scientific study kn own as data mining .which is an unsupervised classification using to find a structu re in a collection of unlabeled data. The useful information in the documents can be accompanied b y a large amount of noise words when using Full Tex t Representation, and therefore will affect negativel y the result of the clustering process. So it is w ith great need to eliminate the noise words and keeping just the useful information in order to enhance the qual ity of the clustering results. This problem occurs with di fferent degree for any language such as English, European, Hindi, Chinese, and Arabic Language. To o vercome this problem, in this paper, we propose a new and efficient Keyphrases extraction method base d on the Suffix Tree data structure (KpST, the extracted Keyphrases are then used in the clusterin g process instead of Full Text Representation. The proposed method for Keyphrases extraction is langua ge independent and therefore it may be applied to a ny language. In this investigation, we are interested to deal with the Arabic language which is one of th e most complex languages. To evaluate our method, we condu ct an experimental study on Arabic Documents using the most popular Clustering approach of Hiera rchical algorithms: Agglomerative Hierarchical algorithm with seven linkage techniques and a varie ty of distance functions and similarity measures to perform Arabic Document Clustering task. The obtain ed results show that our method for extracting Keyphrases increases the quality of the clustering results. We propose also to study the effect of using the stemming for the testing dataset to cluster it with the same documents clustering techniques and similarity/distance measures.
New Clustering Method in High-Dimensional Space Based on Hypergraph-Models
Institute of Scientific and Technical Information of China (English)
CHEN Jian-bin; WANG Shu-jing; SONG Han-tao
2006-01-01
To overcome the limitation of the traditional clustering algorithms which fail to produce meanirigful clusters in high-dimensional, sparseness and binary value data sets, a new method based on hypergraph model is proposed. The hypergraph model maps the relationship present in the original data in high dimensional space into a hypergraph. A hyperedge represents the similarity of attribute-value distribution between two points. A hypergraph partitioning algorithm is used to find a partitioning of the vertices such that the corresponding data items in each partition are highly related and the weight of the hyperedges cut by the partitioning is minimized. The quality of the clustering result can be evaluated by applying the intra-cluster singularity value.Analysis and experimental results have demonstrated that this approach is applicable and effective in wide ranging scheme.
A NOVEL METHOD FOR MULTISTAGE SCENARIO GENERATION BASED ON CLUSTER ANALYSIS
XIAODONG JI; XIUJUAN ZHAO; XIULI CHAO
2006-01-01
Based on cluster analysis, a novel method is introduced in this paper to generate multistage scenarios. A linear programming model is proposed to exclude the arbitrage opportunity by appending a scenario to the generated scenario set. By means of a cited stochastic linear goal programming portfolio model, a case is given to exhibit the virtues of this scenario generation approach.
DIMK-means “Distance-based Initialization Method for K-means Clustering Algorithm”
Raed T. Aldahdooh; Wesam Ashour
2013-01-01
Partition-based clustering technique is one of several clustering techniques that attempt to directly decompose the dataset into a set of disjoint clusters. K-means algorithm dependence on partition-based clustering technique is popular and widely used and applied to a variety of domains. K-means clustering results are extremely sensitive to the initial centroid; this is one of the major drawbacks of k-means algorithm. Due to such sensitivity; several different initialization approaches were ...
A novel PPGA-based clustering analysis method for business cycle indicator selection
Institute of Scientific and Technical Information of China (English)
Dabin ZHANG; Lean YU; Shouyang WANG; Yingwen SONG
2009-01-01
A new clustering analysis method based on the pseudo parallel genetic algorithm (PPGA) is proposed for business cycle indicator selection. In the proposed method,the category of each indicator is coded by real numbers,and some illegal chromosomes are repaired by the identi-fication arid restoration of empty class. Two mutation op-erators, namely the discrete random mutation operator andthe optimal direction mutation operator, are designed to bal-ance the local convergence speed and the global convergence performance, which are then combined with migration strat-egy and insertion strategy. For the purpose of verification and illustration, the proposed method is compared with the K-means clustering algorithm and the standard genetic algo-rithms via a numerical simulation experiment. The experi-mental result shows the feasibility and effectiveness of the new PPGA-based clustering analysis algorithm. Meanwhile,the proposed clustering analysis algorithm is also applied to select the business cycle indicators to examine the status of the macro economy. Empirical results demonstrate that the proposed method can effectively and correctly select some leading indicators, coincident indicators, and lagging indi-cators to reflect the business cycle, which is extremely op-erational for some macro economy administrative managers and business decision-makers.
Unconventional methods for clustering
Kotyrba, Martin
2016-06-01
Cluster analysis or clustering is a task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is the main task of exploratory data mining and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. The topic of this paper is one of the modern methods of clustering namely SOM (Self Organising Map). The paper describes the theory needed to understand the principle of clustering and descriptions of algorithm used with clustering in our experiments.
Centroid Based Text Clustering
Directory of Open Access Journals (Sweden)
Priti Maheshwari
2010-09-01
Full Text Available Web mining is a burgeoning new field that attempts to glean meaningful information from natural language text. Web mining refers generally to the process of extracting interesting information and knowledge from unstructured text. Text clustering is one of the important Web mining functionalities. Text clustering is the task in which texts are classified into groups of similar objects based on their contents. Current research in the area of Web mining is tacklesproblems of text data representation, classification, clustering, information extraction or the search for and modeling of hidden patterns. In this paper we propose for mining large document collections it is necessary to pre-process the web documents and store the information in a data structure, which is more appropriate for further processing than a plain web file. In this paper we developed a php-mySql based utility to convert unstructured web documents into structured tabular representation by preprocessing, indexing .We apply centroid based web clustering method on preprocessed data. We apply three methods for clustering. Finally we proposed a method that can increase accuracy based on clustering ofdocuments.
Galaxy Cluster Mass Reconstruction Project: I. Methods and first results on galaxy-based techniques
Old, L; Pearce, F R; Croton, D; Muldrew, S I; Muñoz-Cuartas, J C; Gifford, D; Gray, M E; von der Linden, A; Mamon, G A; Merrifield, M R; Müller, V; Pearson, R J; Ponman, T J; Saro, A; Sepp, T; Sifón, C; Tempel, E; Tundo, E; Wang, Y O; Wojtak, R
2014-01-01
This paper is the first in a series in which we perform an extensive comparison of various galaxy-based cluster mass estimation techniques that utilise the positions, velocities and colours of galaxies. Our primary aim is to test the performance of these cluster mass estimation techniques on a diverse set of models that will increase in complexity. We begin by providing participating methods with data from a simple model that delivers idealised clusters, enabling us to quantify the underlying scatter intrinsic to these mass estimation techniques. The mock catalogue is based on a Halo Occupation Distribution (HOD) model that assumes spherical Navarro, Frenk and White (NFW) haloes truncated at R_200, with no substructure nor colour segregation, and with isotropic, isothermal Maxwellian velocities. We find that, above 10^14 M_solar, recovered cluster masses are correlated with the true underlying cluster mass with an intrinsic scatter of typically a factor of two. Below 10^14 M_solar, the scatter rises as the nu...
Jai-Houng Leu; Chih-Yao Lo; Chi-Hau Liu
2009-01-01
New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees) on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning poten...
Watts, Michael J.; Worner, Susan P.
2011-01-01
Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regi...
Are fragment-based quantum chemistry methods applicable to medium-sized water clusters?
Yuan, Dandan; Shen, Xiaoling; Li, Wei; Li, Shuhua
2016-06-28
Fragment-based quantum chemistry methods are either based on the many-body expansion or the inclusion-exclusion principle. To compare the applicability of these two categories of methods, we have systematically evaluated the performance of the generalized energy based fragmentation (GEBF) method (J. Phys. Chem. A, 2007, 111, 2193) and the electrostatically embedded many-body (EE-MB) method (J. Chem. Theory Comput., 2007, 3, 46) for medium-sized water clusters (H2O)n (n = 10, 20, 30). Our calculations demonstrate that the GEBF method provides uniformly accurate ground-state energies for 10 low-energy isomers of three water clusters under study at a series of theory levels, while the EE-MB method (with one water molecule as a fragment and without using the cutoff distance) shows a poor convergence for (H2O)20 and (H2O)30 when the basis set contains diffuse functions. Our analysis shows that the neglect of the basis set superposition error for each subsystem has little effect on the accuracy of the GEBF method, but leads to much less accurate results for the EE-MB method. The accuracy of the EE-MB method can be dramatically improved by using an appropriate cutoff distance and using two water molecules as a fragment. For (H2O)30, the average deviation of the EE-MB method truncated up to the three-body level calculated using this strategy (relative to the conventional energies) is about 0.003 hartree at the M06-2X/6-311++G** level, while the deviation of the GEBF method with a similar computational cost is less than 0.001 hartree. The GEBF method is demonstrated to be applicable for electronic structure calculations of water clusters at any basis set. PMID:27263629
Directory of Open Access Journals (Sweden)
Deepa Devasenapathy
2015-01-01
Full Text Available The traffic in the road network is progressively increasing at a greater extent. Good knowledge of network traffic can minimize congestions using information pertaining to road network obtained with the aid of communal callers, pavement detectors, and so on. Using these methods, low featured information is generated with respect to the user in the road network. Although the existing schemes obtain urban traffic information, they fail to calculate the energy drain rate of nodes and to locate equilibrium between the overhead and quality of the routing protocol that renders a great challenge. Thus, an energy-efficient cluster-based vehicle detection in road network using the intention numeration method (CVDRN-IN is developed. Initially, sensor nodes that detect a vehicle are grouped into separate clusters. Further, we approximate the strength of the node drain rate for a cluster using polynomial regression function. In addition, the total node energy is estimated by taking the integral over the area. Finally, enhanced data aggregation is performed to reduce the amount of data transmission using digital signature tree. The experimental performance is evaluated with Dodgers loop sensor data set from UCI repository and the performance evaluation outperforms existing work on energy consumption, clustering efficiency, and node drain rate.
A novel intrusion detection method based on OCSVM and K-means recursive clustering
Directory of Open Access Journals (Sweden)
Leandros A. Maglaras
2015-01-01
Full Text Available In this paper we present an intrusion detection module capable of detecting malicious network traffic in a SCADA (Supervisory Control and Data Acquisition system, based on the combination of One-Class Support Vector Machine (OCSVM with RBF kernel and recursive k-means clustering. Important parameters of OCSVM, such as Gaussian width o and parameter v affect the performance of the classifier. Tuning of these parameters is of great importance in order to avoid false positives and over fitting. The combination of OCSVM with recursive k- means clustering leads the proposed intrusion detection module to distinguish real alarms from possible attacks regardless of the values of parameters o and v, making it ideal for real-time intrusion detection mechanisms for SCADA systems. Extensive simulations have been conducted with datasets extracted from small and medium sized HTB SCADA testbeds, in order to compare the accuracy, false alarm rate and execution time against the base line OCSVM method.
Targets Separation and Imaging Method in Sparse Scene Based on Cluster Result of Range Profile Peaks
Directory of Open Access Journals (Sweden)
YANG Qiu
2015-08-01
Full Text Available This paper focuses on the synthetic aperture radar (SAR imaging of space-sparse targets such as ships on the sea, and proposes a method of targets separation and imaging of sparse scene based on cluster result of range profile peaks. Firstly, wavelet de-noising algorithm is used to preprocess the original echo, and then the range profile at different viewing positions can be obtained by range compression and range migration correction. Peaks of the range profiles can be detected by the fast peak detection algorithm based on second order difference operator. Targets with sparse energy intervals can be imaged through azimuth compression after clustering of peaks in range dimension. What's more, targets without coupling in range energy interval and direction synthetic aperture time can be imaged through azimuth compression after clustering of peaks both in range and direction dimension. Lastly, the effectiveness of the proposed method is validated by simulations. Results of experiment demonstrate that space-sparse targets such as ships can be imaged separately and completely with a small computation in azimuth compression, and the images are more beneficial for target recognition.
Comparison three methods of clustering: k-means, spectral clustering and hierarchical clustering
Kowsari, Kamran
2013-01-01
Comparison of three kind of the clustering and find cost function and loss function and calculate them. Error rate of the clustering methods and how to calculate the error percentage always be one on the important factor for evaluating the clustering methods, so this paper introduce one way to calculate the error rate of clustering methods. Clustering algorithms can be divided into several categories including partitioning clustering algorithms, hierarchical algorithms and density based algor...
Bishop, R. F.; Li, P. H. Y.
2011-04-01
An approximation hierarchy, called the lattice-path-based subsystem (LPSUBm) approximation scheme, is described for the coupled-cluster method (CCM). It is applicable to systems defined on a regular spatial lattice. We then apply it to two well-studied prototypical (spin-(1)/(2) Heisenberg antiferromagnetic) spin-lattice models, namely, the XXZ and the XY models on the square lattice in two dimensions. Results are obtained in each case for the ground-state energy, the ground-state sublattice magnetization, and the quantum critical point. They are all in good agreement with those from such alternative methods as spin-wave theory, series expansions, quantum Monte Carlo methods, and the CCM using the alternative lattice-animal-based subsystem (LSUBm) and the distance-based subsystem (DSUBm) schemes. Each of the three CCM schemes (LSUBm, DSUBm, and LPSUBm) for use with systems defined on a regular spatial lattice is shown to have its own advantages in particular applications.
International Nuclear Information System (INIS)
An approximation hierarchy, called the lattice-path-based subsystem (LPSUBm) approximation scheme, is described for the coupled-cluster method (CCM). It is applicable to systems defined on a regular spatial lattice. We then apply it to two well-studied prototypical (spin-(1/2) Heisenberg antiferromagnetic) spin-lattice models, namely, the XXZ and the XY models on the square lattice in two dimensions. Results are obtained in each case for the ground-state energy, the ground-state sublattice magnetization, and the quantum critical point. They are all in good agreement with those from such alternative methods as spin-wave theory, series expansions, quantum Monte Carlo methods, and the CCM using the alternative lattice-animal-based subsystem (LSUBm) and the distance-based subsystem (DSUBm) schemes. Each of the three CCM schemes (LSUBm, DSUBm, and LPSUBm) for use with systems defined on a regular spatial lattice is shown to have its own advantages in particular applications.
Santos, Miriam Seoane; Abreu, Pedro Henriques; García-Laencina, Pedro J; Simão, Adélia; Carvalho, Armando
2015-12-01
Liver cancer is the sixth most frequently diagnosed cancer and, particularly, Hepatocellular Carcinoma (HCC) represents more than 90% of primary liver cancers. Clinicians assess each patient's treatment on the basis of evidence-based medicine, which may not always apply to a specific patient, given the biological variability among individuals. Over the years, and for the particular case of Hepatocellular Carcinoma, some research studies have been developing strategies for assisting clinicians in decision making, using computational methods (e.g. machine learning techniques) to extract knowledge from the clinical data. However, these studies have some limitations that have not yet been addressed: some do not focus entirely on Hepatocellular Carcinoma patients, others have strict application boundaries, and none considers the heterogeneity between patients nor the presence of missing data, a common drawback in healthcare contexts. In this work, a real complex Hepatocellular Carcinoma database composed of heterogeneous clinical features is studied. We propose a new cluster-based oversampling approach robust to small and imbalanced datasets, which accounts for the heterogeneity of patients with Hepatocellular Carcinoma. The preprocessing procedures of this work are based on data imputation considering appropriate distance metrics for both heterogeneous and missing data (HEOM) and clustering studies to assess the underlying patient groups in the studied dataset (K-means). The final approach is applied in order to diminish the impact of underlying patient profiles with reduced sizes on survival prediction. It is based on K-means clustering and the SMOTE algorithm to build a representative dataset and use it as training example for different machine learning procedures (logistic regression and neural networks). The results are evaluated in terms of survival prediction and compared across baseline approaches that do not consider clustering and/or oversampling using the
A Method of Clustering Components into Modules Based on Products' Functional and Structural Analysis
Institute of Scientific and Technical Information of China (English)
MENG Xiang-hui; JIANG Zu-hua; ZHENG Ying-fei
2006-01-01
Modularity is the key to improving the cost-variety trade-off in product development. To achieve the functional independency and structural independency of modules, a method of clustering components to identify modules based on functional and structural analysis was presented. Two stages were included in the method. In the first stage the products' function was analyzed to determine the primary level of modules. Then the objective function for modules identifying was formulated to achieve functional independency of modules. Finally the genetic algorithm was used to solve the combinatorial optimization problem in modules identifying to form the primary modules of products. In the second stage the cohesion degree of modules and the coupling degree between modules were analyzed. Based on this structural analysis the modular scheme was refined according to the thinking of structural independency. A case study on the gear reducer was conducted to illustrate the validity of the presented method.
Image reconstruction of muon tomographic data using a density-based clustering method
Perry, Kimberly B.
Muons are subatomic particles capable of reaching the Earth's surface before decaying. When these particles collide with an object that has a high atomic number (Z), their path of travel changes substantially. Tracking muon movement through shielded containers can indicate what types of materials lie inside. This thesis proposes using a density-based clustering algorithm called OPTICS to perform image reconstructions using muon tomographic data. The results show that this method is capable of detecting high-Z materials quickly, and can also produce detailed reconstructions with large amounts of data.
Targets Separation and Imaging Method in Sparse Scene Based on Cluster Result of Range Profile Peaks
Yang, Qiu; Qun ZHANG; Wang, Min; Sun, Li
2015-01-01
This paper focuses on the synthetic aperture radar (SAR) imaging of space-sparse targets such as ships on the sea, and proposes a method of targets separation and imaging of sparse scene based on cluster result of range profile peaks. Firstly, wavelet de-noising algorithm is used to preprocess the original echo, and then the range profile at different viewing positions can be obtained by range compression and range migration correction. Peaks of the range profiles can be detected by the fast ...
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
de Brito, Daniel M; Maracaja-Coutinho, Vinicius; de Farias, Savio T; Batista, Leonardo V; do Rêgo, Thaís G
2016-01-01
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me. PMID:26731657
A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm.
Directory of Open Access Journals (Sweden)
Daniel M de Brito
Full Text Available Genomic Islands (GIs are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.
Stability of maximum-likelihood-based clustering methods: exploring the backbone of classifications
International Nuclear Information System (INIS)
Components of complex systems are often classified according to the way they interact with each other. In graph theory such groups are known as clusters or communities. Many different techniques have been recently proposed to detect them, some of which involve inference methods using either Bayesian or maximum likelihood approaches. In this paper, we study a statistical model designed for detecting clusters based on connection similarity. The basic assumption of the model is that the graph was generated by a certain grouping of the nodes and an expectation maximization algorithm is employed to infer that grouping. We show that the method admits further development to yield a stability analysis of the groupings that quantifies the extent to which each node influences its neighbors' group membership. Our approach naturally allows for the identification of the key elements responsible for the grouping and their resilience to changes in the network. Given the generality of the assumptions underlying the statistical model, such nodes are likely to play special roles in the original system. We illustrate this point by analyzing several empirical networks for which further information about the properties of the nodes is available. The search and identification of stabilizing nodes constitutes thus a novel technique to characterize the relevance of nodes in complex networks
Directory of Open Access Journals (Sweden)
Jai-Houng Leu
2009-01-01
Full Text Available New analytical methods and tools which were called FAKDT (Fixed Average K-means base Decision Trees on human performance have been developed and they make us look at the Enterprise in different aspects in this study. Decision Tree Clustering Method is one of the data mining methods that have been applied widely in different fields to analyze a large amount of data in recent years. Generally speaking, in the human resource incubation of an enterprise, if employees of high learning potential, high stability and high emotional quotient are selected, the return of investment in human resources will be more apparent. If employees of the above mentioned traits can be well utilized and incubated, the industry competitiveness of the enterprise will be enhanced effectively. From the personality specialty point of view, its function is to predict the efficiency of the personal achievement in correlation to his some implying personality specialties (blood group, constellation, etc.. The main purpose of this research is to get the useful information and important message about human performance from their historical records with this method. The Decision Tree Clustering Method data mining skills were improved and applied to get the critical factors that affect the human traits for its feasibility in this study.
Fan, Chang-ke; Wu, Yu
2010-01-01
A total of 10 indices of regional economic development in Guangxi are selected. According to the relevant economic data, regional economic development in Guangxi City is analyzed by using System Clustering Method and Principal Component Analysis Method. Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level. Overall economic strength of Guangxi is weak and Nanning has relatively high scores of fac...
Institute of Scientific and Technical Information of China (English)
2010-01-01
A total of 10 indices of regional economic development in Guangxi are selected.According to the relevant economic data,regional economic development in Guangxi is analyzed by using System Clustering Method and Principal Component Analysis Method.Result shows that System Clustering Method and Principal Component Analysis Method have revealed similar results analysis of economic development level.Overall economic strength of Guangxi is weak and Nanning has relatively high scores of factors due to its advantage of the political,economic and cultural center.Comprehensive scores of other regions are all lower than 1,which has big gap with the development of Nanning.Overall development strategy points out that Guangxi should accelerate the construction of the Ring Northern Bay Economic Zone,create a strong logistics system having strategic significance to national development,use the unique location advantage and rely on the modern transportation system to establish a logistics center and business center connecting the hinterland and the Asean Market.Based on the problems of unbalanced regional economic development in Guangxi,we should speed up the development of service industry in Nanning,construct the circular economy system of industrial city,and accelerate the industrialization process of tourism city in order to realize balanced development of regional economy in Guangxi,China.
Effective Term Based Text Clustering Algorithms
P. Ponmuthuramalingam,; T. Devi
2010-01-01
Text clustering methods can be used to group large sets of text documents. Most of the text clustering methods do not address the problems of text clustering such as very high dimensionality of the data and understandability of the clustering descriptions. In this paper, a frequent term based approach of clustering has been introduced; it provides a natural way of reducing a large dimensionality of the document vector space. This approach is based on clustering the low dimensionality frequent...
The tidal tails of globular cluster Palomar 5 based on the neural networks method
Institute of Scientific and Technical Information of China (English)
Hu Zou; Zhen-Yu WU; Jun Ma; Xu Zhou
2009-01-01
The sixth Data Release (DR6) of the Sloan Digital Sky Survey (SDSS) provides more photometric regions,new features and more accurate data around globular cluster Palomar 5.A new method,Back Propagation Neural Network (BPNN),is used to estimate the cluster membership probability in order to detect its tidal tails.Cluster and field stars,used for training the networks,are extracted over a 40×20 deg~2 field by color-magnitude diagrams (CMDs).The best BPNNs with two hidden layers and a Levenberg-Marquardt(LM) training algorithm are determined by the chosen cluster and field samples.The membership probabilities of stars in the whole field are obtained with the BPNNs,and contour maps of the probability distribution show that a tail extends 5.42°to the north of the cluster and another tail extends 3.77°to the south.The tails are similar to those detected by Odenkirchen et al.,but no more debris from the cluster is found to the northeast in the sky.The radial density profiles are investigated both along the tails and near the cluster center.Quite a few substructures are discovered in the tails.The number density profile of the cluster is fitted with the King model and the tidal radius is determined as 14.28'.However,the King model cannot fit the observed profile at the outer regions (R ＞ 8') because of the tidal tails generated by the tidal force.Luminosity functions of the cluster and the tidal tails are calculated,which confirm that the tails originate from Palomar 5.
Splitting Methods for Convex Clustering
Chi, Eric C.; Lange, Kenneth
2013-01-01
Clustering is a fundamental problem in many scientific applications. Standard methods such as $k$-means, Gaussian mixture models, and hierarchical clustering, however, are beset by local minima, which are sometimes drastically suboptimal. Recently introduced convex relaxations of $k$-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. In this work we present two splitting methods for solving the convex clustering problem. The fir...
Dynamic cluster formation using level set methods.
Yip, Andy M; Ding, Chris; Chan, Tony F
2006-06-01
Density-based clustering has the advantages for 1) allowing arbitrary shape of cluster and 2) not requiring the number of clusters as input. However, when clusters touch each other, both the cluster centers and cluster boundaries (as the peaks and valleys of the density distribution) become fuzzy and difficult to determine. We introduce the notion of cluster intensity function (CIF) which captures the important characteristics of clusters. When clusters are well-separated, CIFs are similar to density functions. But, when clusters become closed to each other, CIFs still clearly reveal cluster centers, cluster boundaries, and degree of membership of each data point to the cluster that it belongs. Clustering through bump hunting and valley seeking based on these functions are more robust than that based on density functions obtained by kernel density estimation, which are often oscillatory or oversmoothed. These problems of kernel density estimation are resolved using Level Set Methods and related techniques. Comparisons with two existing density-based methods, valley seeking and DBSCAN, are presented which illustrate the advantages of our approach. PMID:16724583
Cluster Evaluation of Density Based Subspace Clustering
Sembiring, Rahmat Widia; Zain, Jasni Mohamad
2010-01-01
Clustering real world data often faced with curse of dimensionality, where real world data often consist of many dimensions. Multidimensional data clustering evaluation can be done through a density-based approach. Density approaches based on the paradigm introduced by DBSCAN clustering. In this approach, density of each object neighbours with MinPoints will be calculated. Cluster change will occur in accordance with changes in density of each object neighbours. The neighbours of each object ...
Dioba, A.
2010-01-01
The article considers the use of EM algorithm of fuzzy clustering analysis to assess the level of productive personnel employment at industrial enterprises. The methodical approach of the productive personnel employment assessing is suggested. The criteria for evaluating the productive personnel employment are developed in the article.
Clustering Software Methods and Comparison
Directory of Open Access Journals (Sweden)
Rachana Kamble
2014-12-01
Full Text Available Document clustering as associate not supervised approach extensively won’t to navigate, filter, summarize and manage huge group of document repositories just like the World Wide Web (WWW. Recently, Document clustering is that the method of segmenting a selected group of texts into subgroups as well as content based mostly similar ones. the aim of document clustering is to fulfil human interests in info looking and understanding. element based mostly software system development has gained lots of sensible importance within the field of system engineering from educational researchers and additionally from business perspective. Finding parts for economical code utilize is one among the necessary issues aimed by researchers. Clump reduces the search area of parts by grouping similar entities along so guaranteeing reduced time complexness because it reduces the search time for part retrieval. This work can study the key challenges of the clustering drawback, because it applies to the text domain. Additionally can discuss the key ways used for text clustering, and their relative benefits.
GPU-based Multilevel Clustering.
Chiosa, Iurie; Kolb, Andreas
2010-04-01
The processing power of parallel co-processors like the Graphics Processing Unit (GPU) are dramatically increasing. However, up until now only a few approaches have been presented to utilize this kind of hardware for mesh clustering purposes. In this paper we introduce a Multilevel clustering technique designed as a parallel algorithm and solely implemented on the GPU. Our formulation uses the spatial coherence present in the cluster optimization and hierarchical cluster merging to significantly reduce the number of comparisons in both parts . Our approach provides a fast, high quality and complete clustering analysis. Furthermore, based on the original concept we present a generalization of the method to data clustering. All advantages of the meshbased techniques smoothly carry over to the generalized clustering approach. Additionally, this approach solves the problem of the missing topological information inherent to general data clustering and leads to a Local Neighbors k-means algorithm. We evaluate both techniques by applying them to Centroidal Voronoi Diagram (CVD) based clustering. Compared to classical approaches, our techniques generate results with at least the same clustering quality. Our technique proves to scale very well, currently being limited only by the available amount of graphics memory. PMID:20421676
Zou, Ling; Guo, Qian; Xu, Yi; Yang, Biao; Jiao, Zhuqing; Xiang, Jianbo
2016-04-29
Functional magnetic resonance imaging (fMRI) is an important tool in neuroscience for assessing connectivity and interactions between distant areas of the brain. To find and characterize the coherent patterns of brain activity as a means of identifying brain systems for the cognitive reappraisal of the emotion task, both density-based k-means clustering and independent component analysis (ICA) methods can be applied to characterize the interactions between brain regions involved in cognitive reappraisal of emotion. Our results reveal that compared with the ICA method, the density-based k-means clustering method provides a higher sensitivity of polymerization. In addition, it is more sensitive to those relatively weak functional connection regions. Thus, the study concludes that in the process of receiving emotional stimuli, the relatively obvious activation areas are mainly distributed in the frontal lobe, cingulum and near the hypothalamus. Furthermore, density-based k-means clustering method creates a more reliable method for follow-up studies of brain functional connectivity. PMID:27177109
A New Elliptical Grid Clustering Method
Guansheng, Zheng
A new base on grid clustering method is presented in this paper. This new method first does unsupervised learning on the high dimensions data. This paper proposed a grid-based approach to clustering. It maps the data onto a multi-dimensional space and applies a linear transformation to the feature space instead of to the objects themselves and then approach a grid-clustering method. Unlike the conventional methods, it uses a multidimensional hyper-eclipse grid cell. Some case studies and ideas how to use the algorithms are described. The experimental results show that EGC can discover abnormity shapes of clusters.
Institute of Scientific and Technical Information of China (English)
LING Ling; HU Yu-jin; WANG Xue-lin; LI Cheng-gang
2006-01-01
In order to improve the efficiency of ontology construction from heterogeneous knowledge sources, a semantic-based approach is presented. The ontology will be constructed with the application of cluster technique in an incremental way.Firstly, terms will be extracted from knowledge sources and congregate a term set after pretreat-ment. Then the concept set will be built via semantic-based clustering according to semanteme of terms provided by WordNet. Next, a concept tree is constructed in terms of mapping rules between semanteme relationships and concept relationships. The semi-automatic approach can avoid non-consistence due to knowledge engineers having different understanding of the same concept and the obtained ontology is easily to be expanded.
Fast Density Based Clustering Algorithm
Priyanka Trikha; Singh Vijendra
2013-01-01
Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. The traditional algorithms do not meet the latest multiple requirements simultaneously for objects. Density-based clustering algorithms find clusters based on density of data points in a region. DBSCAN algorithm is one of the density-based clustering algorithms. I...
Sanfilippo, Antonio; Calapristi, Augustin J.; Crow, Vernon L.; Hetzler, Elizabeth G.; Turner, Alan E.
2009-12-22
Document clustering methods, document cluster label disambiguation methods, document clustering apparatuses, and articles of manufacture are described. In one aspect, a document clustering method includes providing a document set comprising a plurality of documents, providing a cluster comprising a subset of the documents of the document set, using a plurality of terms of the documents, providing a cluster label indicative of subject matter content of the documents of the cluster, wherein the cluster label comprises a plurality of word senses, and selecting one of the word senses of the cluster label.
Indian Academy of Sciences (India)
Andrea Paz; Andrew J Crawford
2012-11-01
Molecular markers offer a universal source of data for quantifying biodiversity. DNA barcoding uses a standardized genetic marker and a curated reference database to identify known species and to reveal cryptic diversity within well-sampled clades. Rapid biological inventories, e.g. rapid assessment programs (RAPs), unlike most barcoding campaigns, are focused on particular geographic localities rather than on clades. Because of the potentially sparse phylogenetic sampling, the addition of DNA barcoding to RAPs may present a greater challenge for the identification of named species or for revealing cryptic diversity. In this article we evaluate the use of DNA barcoding for quantifying lineage diversity within a single sampling site as compared to clade-based sampling, and present examples from amphibians. We compared algorithms for identifying DNA barcode clusters (e.g. species, cryptic species or Evolutionary Significant Units) using previously published DNA barcode data obtained from geography-based sampling at a site in Central Panama, and from clade-based sampling in Madagascar. We found that clustering algorithms based on genetic distance performed similarly on sympatric as well as clade-based barcode data, while a promising coalescent-based method performed poorly on sympatric data. The various clustering algorithms were also compared in terms of speed and software implementation. Although each method has its shortcomings in certain contexts, we recommend the use of the ABGD method, which not only performs fairly well under either sampling method, but does so in a few seconds and with a user-friendly Web interface.
Directory of Open Access Journals (Sweden)
Michael J. Watts
2011-09-01
Full Text Available Existing cluster-based methods for investigating insect species assemblages or profiles of a region to indicate the risk of new insect pest invasion have a major limitation in that they assign the same species risk factors to each region in a cluster. Clearly regions assigned to the same cluster have different degrees of similarity with respect to their species profile or assemblage. This study addresses this concern by applying weighting factors to the cluster elements used to calculate regional risk factors, thereby producing region-specific risk factors. Using a database of the global distribution of crop insect pest species, we found that we were able to produce highly differentiated region-specific risk factors for insect pests. We did this by weighting cluster elements by their Euclidean distance from the target region. Using this approach meant that risk weightings were derived that were more realistic, as they were specific to the pest profile or species assemblage of each region. This weighting method provides an improved tool for estimating the potential invasion risk posed by exotic species given that they have an opportunity to establish in a target region.
Saeidi, Omid; Torabi, Seyed Rahman; Ataei, Mohammad
2014-03-01
Rock mass classification systems are one of the most common ways of determining rock mass excavatability and related equipment assessment. However, the strength and weak points of such rating-based classifications have always been questionable. Such classification systems assign quantifiable values to predefined classified geotechnical parameters of rock mass. This causes particular ambiguities, leading to the misuse of such classifications in practical applications. Recently, intelligence system approaches such as artificial neural networks (ANNs) and neuro-fuzzy methods, along with multiple regression models, have been used successfully to overcome such uncertainties. The purpose of the present study is the construction of several models by using an adaptive neuro-fuzzy inference system (ANFIS) method with two data clustering approaches, including fuzzy c-means (FCM) clustering and subtractive clustering, an ANN and non-linear multiple regression to estimate the basic rock mass diggability index. A set of data from several case studies was used to obtain the real rock mass diggability index and compared to the predicted values by the constructed models. In conclusion, it was observed that ANFIS based on the FCM model shows higher accuracy and correlation with actual data compared to that of the ANN and multiple regression. As a result, one can use the assimilation of ANNs with fuzzy clustering-based models to construct such rigorous predictor tools.
A Vibration Method for Discovering Density Varied Clusters
Elbatta, Mohammad T.; Bolbol, Raed M.; Wesam M. Ashour
2012-01-01
DBSCAN is a base algorithm for density-based clustering. It can find out the clusters of different shapes and sizes from a large amount of data, which is containing noise and outliers. However, it is fail to handle the local density variation that exists within the cluster. Thus, a good clustering method should allow a significant density variation within the cluster because, if we go for homogeneous clustering, a large number of smaller unimportant clusters may be generated. In this paper, a...
Directory of Open Access Journals (Sweden)
Peixin Zhao
2014-01-01
Full Text Available This paper suggests a novel clustering method for analyzing the National Incident-Based Reporting System (NIBRS data, which include the determination of correlation of different crime types, the development of a likelihood index for crimes to occur in a jurisdiction, and the clustering of jurisdictions based on crime type. The method was tested by using the 2005 assault data from 121 jurisdictions in Virginia as a test case. The analyses of these data show that some different crime types are correlated and some different crime parameters are correlated with different crime types. The analyses also show that certain jurisdictions within Virginia share certain crime patterns. This information assists with constructing a pattern for a specific crime type and can be used to determine whether a jurisdiction may be more likely to see this type of crime occur in their area.
Watchdog-LEACH: A new method based on LEACH protocol to Secure Clustered Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Mohammad Reza Rohbanian
2013-07-01
Full Text Available Wireless sensor network comprises of small sensor nodes with limited resources. Clustered networks have been proposed in many researches to reduce the power consumption in sensor networks. LEACH is one of the most interested techniques that offer an efficient way to minimize the power consumption in sensor networks. However, due to the characteristics of restricted resources and operation in a hostile environment, WSNs are subjected to numerous threats and are vulnerable to attacks. This research proposes a solution that can be applied on LEACH to increase the level of security. In Watchdog-LEACH, some nodes are considered as watchdogs and some changes are applied on LEACH protocol for intrusion detection. Watchdog-LEACH is able to protect against a wide range of attacks and it provides security, energy efficiency and memory efficiency. The result of simulation shows that in comparison to LEACH, the energy overhead is about 2% so this method is practical and can be applied to WSNs.
Transfer Prototype-based Fuzzy Clustering
Deng, Zhaohong; Jiang, Yizhang; Chung, Fu-Lai; Ishibuchi, Hisao; Choi, Kup-Sze; Wang, Shitong
2014-01-01
The traditional prototype based clustering methods, such as the well-known fuzzy c-mean (FCM) algorithm, usually need sufficient data to find a good clustering partition. If the available data is limited or scarce, most of the existing prototype based clustering algorithms will no longer be effective. While the data for the current clustering task may be scarce, there is usually some useful knowledge available in the related scenes/domains. In this study, the concept of transfer learning is a...
International Nuclear Information System (INIS)
We have developed a new method, K2, optimized for the detection of galaxy clusters in multicolor images. Based on the Red Sequence approach, K2 detects clusters using simultaneous enhancements in both colors and position. The detection significance is robustly determined through extensive Monte Carlo simulations and through comparison with available cluster catalogs based on two different optical methods, and also on X-ray data. K2 also provides quantitative estimates of the candidate clusters' richness and photometric redshifts. Initially, K2 was applied to the two color (gri) 161 deg2 images of the Canada-France-Hawaii Telescope Legacy Survey Wide (CFHTLS-W) data. Our simulations show that the false detection rate for these data, at our selected threshold, is only ∼1%, and that the cluster catalogs are ∼80% complete up to a redshift of z = 0.6 for Fornax-like and richer clusters and to z ∼ 0.3 for poorer clusters. Based on the g-, r-, and i-band photometric catalogs of the Terapix T05 release, 35 clusters/deg2 are detected, with 1-2 Fornax-like or richer clusters every 2 deg2. Catalogs containing data for 6144 galaxy clusters have been prepared, of which 239 are rich clusters. These clusters, especially the latter, are being searched for gravitational lenses-one of our chief motivations for cluster detection in CFHTLS. The K2 method can be easily extended to use additional color information and thus improve overall cluster detection to higher redshifts. The complete set of K2 cluster catalogs, along with the supplementary catalogs for the member galaxies, are available on request from the authors.
Combination Clustering Analysis Method and its Application
Bang-Chun Wen; Li-Yuan Dong; Qin-Liang Li; Yang Liu
2013-01-01
The traditional clustering analysis method can not automatically determine the optimal clustering number. In this study, we provided a new clustering analysis method which is combination clustering analysis method to solve this problem. Through analyzed 25 kinds of automobile data samples by combination clustering analysis method, the correctness of the analysis result was verified. It showed that combination clustering analysis method could objectively determine the number of clustering firs...
FAULT DIAGNOSIS BASED ON INTE- GRATION OF CLUSTER ANALYSIS,ROUGH SET METHOD AND FUZZY NEURAL NETWORK
Institute of Scientific and Technical Information of China (English)
Feng Zhipeng; Song Xigeng; Chu Fulei
2004-01-01
In order to increase the efficiency and decrease the cost of machinery diagnosis, a hybrid system of computational intelligence methods is presented. Firstly, the continuous attributes in diagnosis decision system are discretized with the self-organizing map (SOM) neural network. Then, dynamic reducts are computed based on rough set method, and the key conditions for diagnosis are found according to the maximum cluster ratio. Lastly, according to the optimal reduct, the adaptive neuro-fuzzy inference system (ANFIS) is designed for fault identification. The diagnosis of a diesel verifies the feasibility of engineering applications.
CORECLUSTER: A Degeneracy Based Graph Clustering Framework
Giatsidis, Christos; Malliaros, Fragkiskos; Thilikos, Dimitrios M.; Vazirgiannis, Michalis
2014-01-01
Graph clustering or community detection constitutes an important task forinvestigating the internal structure of graphs, with a plethora of applications in several domains. Traditional tools for graph clustering, such asspectral methods, typically suffer from high time and space complexity. In thisarticle, we present \\textsc{CoreCluster}, an efficient graph clusteringframework based on the concept of graph degeneracy, that can be used along withany known graph clustering algorithm. Our approa...
Xin Liu
2015-01-01
In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio freque...
Cluster Tree Based Hybrid Document Similarity Measure
Directory of Open Access Journals (Sweden)
M. Varshana Devi
2015-10-01
Full Text Available <Cluster tree based hybrid similarity measure is established to measure the hybrid similarity. In cluster tree, the hybrid similarity measure can be calculated for the random data even it may not be the co-occurred and generate different views. Different views of tree can be combined and choose the one which is significant in cost. A method is proposed to combine the multiple views. Multiple views are represented by different distance measures into a single cluster. Comparing the cluster tree based hybrid similarity with the traditional statistical methods it gives the better feasibility for intelligent based search. It helps in improving the dimensionality reduction and semantic analysis.
Model-free functional MRI analysis using cluster-based methods
Otto, Thomas D.; Meyer-Baese, Anke; Hurdal, Monica; Sumners, DeWitt; Auer, Dorothee; Wismuller, Axel
2003-08-01
Conventional model-based or statistical analysis methods for functional MRI (fMRI) are easy to implement, and are effective in analyzing data with simple paradigms. However, they are not applicable in situations in which patterns of neural response are complicated and when fMRI response is unknown. In this paper the "neural gas" network is adapted and rigorously studied for analyzing fMRI data. The algorithm supports spatial connectivity aiding in the identification of activation sites in functional brain imaging. A comparison of this new method with Kohonen's self-organizing map and with a minimal free energy vector quantizer is done in a systematic fMRI study showing comparative quantitative evaluations. The most important findings in this paper are: (1) the "neural gas" network outperforms the other two methods in terms of detecting small activation areas, and (2) computed reference function several that the "neural gas" network outperforms the other two methods. The applicability of the new algorithm is demonstrated on experimental data.
A Fast Three-Phase Line Segments Clustering Method Based on Relative Spatial Relationship
Liu, Y. Q.; X.H. Su; Wu, E. H.
2013-01-01
Lines indicate structure information of objects. However, the general line detectors cannot give enough clear information with many short or discontinuous line segments. This study presents a new fast three-phase line segment clustering algorithm. Firstly, Hough transform or LSD algorithm is used to attain initial line set; and then these lines are grouped into different sets according to direction; and then each direction set is further subdivided into dif...
Projection-based curve clustering
International Nuclear Information System (INIS)
This paper focuses on unsupervised curve classification in the context of nuclear industry. At the Commissariat a l'Energie Atomique (CEA), Cadarache (France), the thermal-hydraulic computer code CATHARE is used to study the reliability of reactor vessels. The code inputs are physical parameters and the outputs are time evolution curves of a few other physical quantities. As the CATHARE code is quite complex and CPU time-consuming, it has to be approximated by a regression model. This regression process involves a clustering step. In the present paper, the CATHARE output curves are clustered using a k-means scheme, with a projection onto a lower dimensional space. We study the properties of the empirically optimal cluster centres found by the clustering method based on projections, compared with the 'true' ones. The choice of the projection basis is discussed, and an algorithm is implemented to select the best projection basis among a library of orthonormal bases. The approach is illustrated on a simulated example and then applied to the industrial problem. (authors)
High Dimensional Data Clustering Using Fast Cluster Based Feature Selection
Directory of Open Access Journals (Sweden)
Karthikeyan.P
2014-03-01
Full Text Available Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clustering-based feature selection algorithm (FAST is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent; the clustering-based strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree (MST using the Kruskal‟s Algorithm clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Index Terms—
Directory of Open Access Journals (Sweden)
G V S Rajkumar
2011-07-01
Full Text Available Image segmentation is one of the most important area of image retrieval. In colour image segmentation the feature vector of each image region is 'n' dimension different from grey level image. In this paper a new image segmentation algorithm is developed and analyzed using the finite mixture of doubly truncated bivariate Gaussian distribution by integrating with the hierarchical clustering. The number of image regions in the whole image is determined using the hierarchical clustering algorithm. Assuming that a bivariate feature vector (consisting of Hue angle and Saturation of each pixel in the image region follows a doubly truncated bivariate Gaussian distribution, the segmentation algorithm is developed. The model parameters are estimated using EM-Algorithm, the updated equations of EM-Algorithm for a finite mixture of doubly truncated Gaussian distribution are derived. A segmentation algorithm for colour images is proposed by using component maximum likelihood. The performance of the developed algorithm is evaluated by carrying out experimentation with five images taken form Berkeley image dataset and computing the image segmentation metrics like, Global Consistency Error (GCE, Variation of Information (VOI, and Probability Rand Index (PRI. The experimentation results show that this algorithm outperforms the existing image segmentation algorithms.
Cycle-Based Cluster Variational Method for Direct and Inverse Inference
Furtlehner, Cyril; Decelle, Aurélien
2016-08-01
Large scale inference problems of practical interest can often be addressed with help of Markov random fields. This requires to solve in principle two related problems: the first one is to find offline the parameters of the MRF from empirical data (inverse problem); the second one (direct problem) is to set up the inference algorithm to make it as precise, robust and efficient as possible. In this work we address both the direct and inverse problem with mean-field methods of statistical physics, going beyond the Bethe approximation and associated belief propagation algorithm. We elaborate on the idea that loop corrections to belief propagation can be dealt with in a systematic way on pairwise Markov random fields, by using the elements of a cycle basis to define regions in a generalized belief propagation setting. For the direct problem, the region graph is specified in such a way as to avoid feed-back loops as much as possible by selecting a minimal cycle basis. Following this line we are led to propose a two-level algorithm, where a belief propagation algorithm is run alternatively at the level of each cycle and at the inter-region level. Next we observe that the inverse problem can be addressed region by region independently, with one small inverse problem per region to be solved. It turns out that each elementary inverse problem on the loop geometry can be solved efficiently. In particular in the random Ising context we propose two complementary methods based respectively on fixed point equations and on a one-parameter log likelihood function minimization. Numerical experiments confirm the effectiveness of this approach both for the direct and inverse MRF inference. Heterogeneous problems of size up to 10^5 are addressed in a reasonable computational time, notably with better convergence properties than ordinary belief propagation.
Liu, Xin
2015-01-01
In a cognitive sensor network (CSN), the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs) becomes very large. In this paper, a novel wireless power transfer (WPT)-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF) energy of the primary node (PN) to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability. PMID:26528987
Directory of Open Access Journals (Sweden)
Xin Liu
2015-10-01
Full Text Available In a cognitive sensor network (CSN, the wastage of sensing time and energy is a challenge to cooperative spectrum sensing, when the number of cooperative cognitive nodes (CNs becomes very large. In this paper, a novel wireless power transfer (WPT-based weighed clustering cooperative spectrum sensing model is proposed, which divides all the CNs into several clusters, and then selects the most favorable CNs as the cluster heads and allows the common CNs to transfer the received radio frequency (RF energy of the primary node (PN to the cluster heads, in order to supply the electrical energy needed for sensing and cooperation. A joint resource optimization is formulated to maximize the spectrum access probability of the CSN, through jointly allocating sensing time and clustering number. According to the resource optimization results, a clustering algorithm is proposed. The simulation results have shown that compared to the traditional model, the cluster heads of the proposed model can achieve more transmission power and there exists optimal sensing time and clustering number to maximize the spectrum access probability.
Cluster identification based on correlations
Schulman, L. S.
2012-04-01
The problem addressed is the identification of cooperating agents based on correlations created as a result of the joint action of these and other agents. A systematic method for using correlations beyond second moments is developed. The technique is applied to a didactic example, the identification of alphabet letters based on correlations among the pixels used in an image of the letter. As in this example, agents can belong to more than one cluster. Moreover, the identification scheme does not require that the patterns be known ahead of time.
Cluster Based Text Classification Model
DEFF Research Database (Denmark)
Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock
2011-01-01
We propose a cluster based classification model for suspicious email detection and other text classification tasks. The text classification tasks comprise many training examples that require a complex classification model. Using clusters for classification makes the model simpler and increases the...... classifier is trained on each cluster having reduced dimensionality and less number of examples. The experimental results show that the proposed model outperforms the existing classification models for the task of suspicious email detection and topic categorization on the Reuters-21578 and 20 Newsgroups...... datasets. Our model also outperforms A Decision Cluster Classification (ADCC) and the Decision Cluster Forest Classification (DCFC) models on the Reuters-21578 dataset....
International Nuclear Information System (INIS)
Highlights: • A novel pattern sequence-based direct time series forecasting method was proposed. • Due to the use of SOM’s topology preserving property, only SOM can be applied. • SCPSNSP only deals with the cluster patterns not each specific time series value. • SCPSNSP performs better than recently developed forecasting algorithms. - Abstract: In this paper, we propose a new day-ahead direct time series forecasting method for competitive electricity markets based on clustering and next symbol prediction. In the clustering step, pattern sequence and their topology relations are obtained from self organizing map time series clustering. In the next symbol prediction step, with each cluster label in the pattern sequence represented as a pair of its topologically identical coordinates, artificial neural network is used to predict the topological coordinates of next day by training the relationship between previous daily pattern sequence and its next day pattern. According to the obtained topology relations, the nearest nonzero hits pattern is assigned to next day so that the whole time series values can be directly forecasted from the assigned cluster pattern. The proposed method was evaluated on Spanish, Australian and New York electricity markets and compared with PSF and some of the most recently published forecasting methods. Experimental results show that the proposed method outperforms the best forecasting methods at least 3.64%
Cosine-Based Clustering Algorithm Approach
Directory of Open Access Journals (Sweden)
Mohammed A. H. Lubbad
2012-02-01
Full Text Available Due to many applications need the management of spatial data; clustering large spatial databases is an important problem which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shapes. It must be insensitive to the outliers (noise and the order of input data. In this paper Cosine Cluster is proposed based on cosine transformation, which satisfies all the above requirements. Using multi-resolution property of cosine transforms, arbitrary shape clusters can be effectively identified at different degrees of accuracy. Cosine Cluster is also approved to be highly efficient in terms of time complexity. Experimental results on very large data sets are presented, which show the efficiency and effectiveness of the proposed approach compared to other recent clustering methods.
Methods for co-clustering: a review
Brault, Vincent; Lomet, Aurore
2015-01-01
Co-clustering aims to identify block patterns in a data table, from a joint clustering of rows and columns. This problem has been studied since 1965, with recent interests in various fields, ranging from graph analysis, machine learning, data mining and genomics. Several variants have been proposed with diverse names: bi-clustering, block clustering, cross-clustering, or simultaneous clustering. We propose here a review of these methods in order to describe, compare and discuss the different ...
Izadi, Hossein; Sadri, Javad; Mehran, Nosrat-Agha
2015-08-01
Mineral segmentation in thin sections is a challenging, popular, and important research topic in computational geology, mineralogy, and mining engineering. Mineral segmentation in thin sections containing altered minerals, in which there are no evident and close boundaries, is a rather complex process. Most of the thin sections created in industries include altered minerals. However, intelligent mineral segmentation in thin sections containing altered minerals has not been widely investigated in the literature, and the current state of the art algorithms are not able to accurately segment minerals in such thin sections. In this paper, a novel method based on incremental learning for clustering pixels is proposed in order to segment index minerals in both thin sections with and without altered minerals. Our algorithm uses 12 color features that are extracted from thin section images. These features include red, green, blue, hue, saturation and intensity, under plane and cross polarized lights in maximum intensity situation. The proposed method has been tested on 155 igneous samples and the overall accuracy of 92.15% and 85.24% has been obtained for thin sections without altered minerals and thin sections containing altered minerals, respectively. Experimental results indicate that the proposed method outperforms the results of other similar methods in the literature, especially for segmenting thin sections containing altered minerals. The proposed algorithm could be applied in applications which require a real time segmentation or efficient identification map such as petroleum geology, petrography and NASA Mars explorations.
Breaking the hierarchy - a new cluster selection mechanism for hierarchical clustering methods
Directory of Open Access Journals (Sweden)
Zweig Katharina A
2009-10-01
Full Text Available Abstract Background Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and
Document Clustering Based on Semi-Supervised Term Clustering
Directory of Open Access Journals (Sweden)
Hamid Mahmoodi
2012-05-01
Full Text Available The study is conducted to propose a multi-step feature (term selection process and in semi-supervised fashion, provide initial centers for term clusters. Then utilize the fuzzy c-means (FCM clustering algorithm for clustering terms. Finally assign each of documents to closest associated term clusters. While most text clustering algorithms directly use documents for clustering, we propose to first group the terms using FCM algorithm and then cluster documents based on terms clusters. We evaluate effectiveness of our technique on several standard text collections and compare our results with the some classical text clustering algorithms.
A Local Pair Natural Orbital-Based Multireference Mukherjee’s Coupled Cluster Method
Czech Academy of Sciences Publication Activity Database
Demel, Ondřej; Pittner, Jiří
2015-01-01
Roč. 11, č. 7 (2015), s. 3104-3114. ISSN 1549-9618 R&D Projects: GA ČR GAP208/11/2222; GA ČR(CZ) GJ15-00058Y Institutional support: RVO:61388955 Keywords : ELECTRON CORRELATION METHODS * BRILLOUIN-WIGNER * CONFIGURATION-INTERACTION Subject RIV: CF - Physical ; Theoretical Chemistry Impact factor: 5.498, year: 2014
Ghahari, Alireza
2009-01-01
Multiview 3D face modeling has attracted increasing attention recently and has become one of the potential avenues in future video systems. We aim to make more reliable and robust automatic feature extraction and natural 3D feature construction from 2D features detected on a pair of frontal and profile view face images. We propose several heuristic algorithms to minimize possible errors introduced by prevalent nonperfect orthogonal condition and noncoherent luminance. In our approach, we first extract the 2D features that are visible to both cameras in both views. Then, we estimate the coordinates of the features in the hidden profile view based on the visible features extracted in the two orthogonal views. Finally, based on the coordinates of the extracted features, we deform a 3D generic model to perform the desired 3D clone modeling. Present study proves the scope of resulted facial models for practical applications like face recognition and facial animation.
Niching method using clustering crowding
Institute of Scientific and Technical Information of China (English)
GUO Guan-qi; GUI Wei-hua; WU Min; YU Shou-yi
2005-01-01
This study analyzes drift phenomena of deterministic crowding and probabilistic crowding by using equivalence class model and expectation proportion equations. It is proved that the replacement errors of deterministic crowding cause the population converging to a single individual, thus resulting in premature stagnation or losing optional optima. And probabilistic crowding can maintain equilibrium multiple subpopulations as the population size is adequate large. An improved niching method using clustering crowding is proposed. By analyzing topology of fitness landscape using hill valley function and extending the search space for similarity analysis, clustering crowding determines the locality of search space more accurately, thus greatly decreasing replacement errors of crowding. The integration of deterministic and probabilistic replacement increases the capacity of both parallel local hill climbing and maintaining multiple subpopulations. The experimental results optimizing various multimodal functions show that,the performances of clustering crowding, such as the number of effective peaks maintained, average peak ratio and global optimum ratio are uniformly superior to those of the evolutionary algorithms using fitness sharing, simple deterministic crowding and probabilistic crowding.
Coupled-cluster method for excitation energies
International Nuclear Information System (INIS)
The coupled-cluster method of electronic-structure calculation is briefly introduced and examined as to its dependence upon the choice of reference state. It is found that the method depends relatively weakly on the reference state if single-particle ''clusters'' are included in the calculations. This fact makes it reasonable to combine coupled-cluster calculations of ground and excited states, based on the same reference wave function, to obtain an equation for the excitation energy. This excitation-energy equation is of nearly the same form as that obtained by the ''equations of motion'' approach, but contains additional terms which should improve the description of orbital-relaxation and state-dependent correlation effects
Model-based clustered-dot screening
Kim, Sang Ho
2006-01-01
I propose a halftone screen design method based on a human visual system model and the characteristics of the electro-photographic (EP) printer engine. Generally, screen design methods based on human visual models produce dispersed-dot type screens while design methods considering EP printer characteristics generate clustered-dot type screens. In this paper, I propose a cost function balancing the conflicting characteristics of the human visual system and the printer. By minimizing the obtained cost function, I design a model-based clustered-dot screen using a modified direct binary search algorithm. Experimental results demonstrate the superior quality of the model-based clustered-dot screen compared to a conventional clustered-dot screen.
Pavement Crack Detection Using Spectral Clustering Method
Directory of Open Access Journals (Sweden)
Jin Huazhong
2015-01-01
Full Text Available Pavement crack detection plays an important role in pavement maintaining and management, nowadays, which could be performed through remote image analysis. Thus, edges of pavement crack should be extracted in advance; in general, traditional edge detection methods don’t consider phase information and the spatial relationship between the adjacent image areas to extract the edges. To overcome the deficiency of the traditional approaches, this paper proposes a pavement crack detection algorithm based on spectral clustering method. Firstly, a measure of similarity between pairs of pixels is taken into account through orientation energy. Then, spatial relationship is needed to find regions where similarity between pixels in a given region is high and similarity between pixels in different regions is low. After that, crack edge detection is completed with spectral clustering method. The presented method has been run on some real life images of pavement crack, experimental results display that the crack detection method of this paper could obtain ideal result.
A method of open cluster membership determination
Javakhishvili, G; Todua, M; Inasaridze, R
2006-01-01
A new method for the determination of open cluster membership based on a cumulative effect is proposed. In the field of a plate the relative x and y coordinate positions of each star with respect to all the other stars are added. The procedure is carried out for two epochs t_1 and t_2 separately, then one sum is subtracted from another. For a field star the differences in its relative coordinate positions of two epochs will be accumulated. For a cluster star, on the contrary, the changes in relative positions of cluster members at t_1 and t_2 will be very small. On the histogram of sums the cluster stars will gather to the left of the diagram, while the field stars will form a tail to the right. The procedure allows us to efficiently discriminate one group from another. The greater the distance between t_1 and t_2 and the more cluster stars present, the greater is the effect. The accumulation method does not require reference stars, determination of centroids and modelling the distribution of field stars, nec...
Abusamra, Heba
2016-07-20
The native nature of high dimension low sample size of gene expression data make the classification task more challenging. Therefore, feature (gene) selection become an apparent need. Selecting a meaningful and relevant genes for classifier not only decrease the computational time and cost, but also improve the classification performance. Among different approaches of feature selection methods, however most of them suffer from several problems such as lack of robustness, validation issues etc. Here, we present a new feature selection technique that takes advantage of clustering both samples and genes. Materials and methods We used leukemia gene expression dataset [1]. The effectiveness of the selected features were evaluated by four different classification methods; support vector machines, k-nearest neighbor, random forest, and linear discriminate analysis. The method evaluate the importance and relevance of each gene cluster by summing the expression level for each gene belongs to this cluster. The gene cluster consider important, if it satisfies conditions depend on thresholds and percentage otherwise eliminated. Results Initial analysis identified 7120 differentially expressed genes of leukemia (Fig. 15a), after applying our feature selection methodology we end up with specific 1117 genes discriminating two classes of leukemia (Fig. 15b). Further applying the same method with more stringent higher positive and lower negative threshold condition, number reduced to 58 genes have be tested to evaluate the effectiveness of the method (Fig. 15c). The results of the four classification methods are summarized in Table 11. Conclusions The feature selection method gave good results with minimum classification error. Our heat-map result shows distinct pattern of refines genes discriminating between two classes of leukemia.
Model-based clustering using copulas with applications
Kosmidis, Ioannis; Karlis, Dimitris
2014-01-01
The majority of model-based clustering techniques is based on multivariate Normal models and their variants. In this paper copulas are used for the construction of flexible families of models for clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and ii) the explicit choice of marginal distributions for the cluster...
Resampling methods for document clustering
Volk, D.; Stepanov, M. G.
2001-01-01
We compare the performance of different clustering algorithms applied to the task of unsupervised text categorization. We consider agglomerative clustering algorithms, principal direction divisive partitioning and (for the first time) superparamagnetic clustering with several distance measures. The algorithms have been applied to test databases extracted from the Reuters-21578 text categorization test database. We find that simple application of the different clustering algorithms yields clus...
Voting-based consensus clustering for combining multiple clusterings of chemical structures
Directory of Open Access Journals (Sweden)
Saeed Faisal
2012-12-01
Full Text Available Abstract Background Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Results The cumulative voting-based aggregation algorithm (CVAA, cluster-based similarity partitioning algorithm (CSPA and hyper-graph partitioning algorithm (HGPA were examined. The F-measure and Quality Partition Index method (QPI were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria. Conclusions The results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA was the method of choice among consensus clustering methods.
Chen Bernard; Mete Mutlu; Kockara Sinan; Aydin Kemal
2010-01-01
Abstract Background Computer-aided segmentation and border detection in dermoscopic images is one of the core components of diagnostic procedures and therapeutic interventions for skin cancer. Automated assessment tools for dermoscopy images have become an important research field mainly because of inter- and intra-observer variations in human interpretation. In this study, we compare two approaches for automatic border detection in dermoscopy images: density based clustering (DBSCAN) and Fuz...
A local distribution based spatial clustering algorithm
Deng, Min; Liu, Qiliang; Li, Guangqiang; Cheng, Tao
2009-10-01
Spatial clustering is an important means for spatial data mining and spatial analysis, and it can be used to discover the potential spatial association rules and outliers among the spatial data. Most existing spatial clustering algorithms only utilize the spatial distance or local density to find the spatial clusters in a spatial database, without taking the spatial local distribution characters into account, so that the clustered results are unreasonable in many cases. To overcome such limitations, this paper develops a new indicator (i.e. local median angle) to measure the local distribution at first, and further proposes a new algorithm, called local distribution based spatial clustering algorithm (LDBSC in abbreviation). In the process of spatial clustering, a series of recursive search are implemented for all the entities so that those entities with its local median angle being very close or equal are clustered. In this way, all the spatial entities in the spatial database can be automatically divided into some clusters. Finally, two tests are implemented to demonstrate that the method proposed in this paper is more prominent than DBSCAN, as well as that it is very robust and feasible, and can be used to find the clusters with different shapes.
Single pass kernel -means clustering method
Indian Academy of Sciences (India)
T Hitendra Sarma; P Viswanath; B Eswara Reddy
2013-06-01
In unsupervised classiﬁcation, kernel -means clustering method has been shown to perform better than conventional -means clustering method in identifying non-isotropic clusters in a data set. The space and time requirements of this method are $O(n^2)$, where is the data set size. Because of this quadratic time complexity, the kernel -means method is not applicable to work with large data sets. The paper proposes a simple and faster version of the kernel -means clustering method, called single pass kernel k-means clustering method. The proposed method works as follows. First, a random sample $\\mathcal{S}$ is selected from the data set $\\mathcal{D}$. A partition $\\Pi_{\\mathcal{S}}$ is obtained by applying the conventional kernel -means method on the random sample $\\mathcal{S}$. The novelty of the paper is, for each cluster in $\\Pi_{\\mathcal{S}}$, the exact cluster center in the input space is obtained using the gradient descent approach. Finally, each unsampled pattern is assigned to its closest exact cluster center to get a partition of the entire data set. The proposed method needs to scan the data set only once and it is much faster than the conventional kernel -means method. The time complexity of this method is $O(s^2+t+nk)$ where is the size of the random sample $\\mathcal{S}$, is the number of clusters required, and is the time taken by the gradient descent method (to ﬁnd exact cluster centers). The space complexity of the method is $O(s^2)$. The proposed method can be easily implemented and is suitable for large data sets, like those in data mining applications. Experimental results show that, with a small loss of quality, the proposed method can signiﬁcantly reduce the time taken than the conventional kernel -means clustering method. The proposed method is also compared with other recent similar methods.
Data Clustering Analysis Based on Wavelet Feature Extraction
Institute of Scientific and Technical Information of China (English)
QIANYuntao; TANGYuanyan
2003-01-01
A novel wavelet-based data clustering method is presented in this paper, which includes wavelet feature extraction and cluster growing algorithm. Wavelet transform can provide rich and diversified information for representing the global and local inherent structures of dataset. therefore, it is a very powerful tool for clustering feature extraction. As an unsupervised classification, the target of clustering analysis is dependent on the specific clustering criteria. Several criteria that should be con-sidered for general-purpose clustering algorithm are pro-posed. And the cluster growing algorithm is also con-structed to connect clustering criteria with wavelet fea-tures. Compared with other popular clustering methods,our clustering approach provides multi-resolution cluster-ing results,needs few prior parameters, correctly deals with irregularly shaped clusters, and is insensitive to noises and outliers. As this wavelet-based clustering method isaimed at solving two-dimensional data clustering prob-lem, for high-dimensional datasets, self-organizing mapand U-matrlx method are applied to transform them intotwo-dimensional Euclidean space, so that high-dimensional data clustering analysis,Results on some sim-ulated data and standard test data are reported to illus-trate the power of our method.
Document Clustering using Sequential Information Bottleneck Method
MS. P.J.Gayathri; S.C. Punitha; Dr.M.Punithavalli
2010-01-01
Document clustering is a subset of the larger field of data clustering, which borrows concepts from the fields of information retrieval (IR), natural language processing (NLP), and machine learning (ML). It is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. There exist a wide variety of unsupervised clustering algorithms. In this paper presents a sequential algorithm for document clustering based with an...
Clustering based segmentation of text in complex color images
Institute of Scientific and Technical Information of China (English)
毛文革; 王洪滨; 张田文
2004-01-01
We propose a novel scheme based on clustering analysis in color space to solve text segmentation in complex color images. Text segmentation includes automatic clustering of color space and foreground image generation. Two methods are also proposed for automatic clustering: The first one is to determine the optimal number of clusters and the second one is the fuzzy competitively clustering method based on competitively learning techniques. Essential foreground images obtained from any of the color clusters are combined into foreground images. Further performance analysis reveals the advantages of the proposed methods.
Normalization based K means Clustering Algorithm
Virmani, Deepali; Taneja, Shweta; Malhotra, Geetika
2015-01-01
K-means is an effective clustering technique used to separate similar data into groups based on initial centroids of clusters. In this paper, Normalization based K-means clustering algorithm(N-K means) is proposed. Proposed N-K means clustering algorithm applies normalization prior to clustering on the available data as well as the proposed approach calculates initial centroids based on weights. Experimental results prove the betterment of proposed N-K means clustering algorithm over existing...
The polarizable embedding coupled cluster method
DEFF Research Database (Denmark)
Sneskov, Kristian; Schwabe, Tobias; Kongsted, Jacob; Christiansen, Ove
2011-01-01
We formulate a new combined quantum mechanics/molecular mechanics (QM/MM) method based on a self-consistent polarizable embedding (PE) scheme. For the description of the QM region, we apply the popular coupled cluster (CC) method detailing the inclusion of electrostatic and polarization effects...... hyperpolarizabilities all coupled to a polarizable MM environment. In the process, we identify CC densitylike intermediates that allow for a very efficient implementation retaining a computational low cost of the QM/MM terms even when the number of MM sites increases. The strengths of the new implementation are...
Fuzzy Clustering Methods and their Application to Fuzzy Modeling
DEFF Research Database (Denmark)
Kroszynski, Uri; Zhou, Jianjun
1999-01-01
Fuzzy modeling techniques based upon the analysis of measured input/output data sets result in a set of rules that allow to predict system outputs from given inputs. Fuzzy clustering methods for system modeling and identification result in relatively small rule-bases, allowing fast, yet accurate...... prediction of outputs. This article presents an overview of some of the most popular clustering methods, namely Fuzzy Cluster-Means (FCM) and its generalizations to Fuzzy C-Lines and Elliptotypes. The algorithms for computing cluster centers and principal directions from a training data-set are described. A...
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
Goswami, Saptarsi; Chakrabarti, Amlan
2012-01-01
Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes...
Robust Clustering Method in the Presence of Scattered Observations.
Notsu, Akifumi; Eguchi, Shinto
2016-06-01
Contamination of scattered observations, which are either featureless or unlike the other observations, frequently degrades the performance of standard methods such as K-means and model-based clustering. In this letter, we propose a robust clustering method in the presence of scattered observations called Gamma-clust. Gamma-clust is based on a robust estimation for cluster centers using gamma-divergence. It provides a proper solution for clustering in which the distributions for clustered data are nonnormal, such as t-distributions with different variance-covariance matrices and degrees of freedom. As demonstrated in a simulation study and data analysis, Gamma-clust is more flexible and provides superior results compared to the robustified K-means and model-based clustering. PMID:26942745
Quartile Clustering: A quartile based technique for Generating Meaningful Clusters
Goswami, Saptarsi
2012-01-01
Clustering is one of the main tasks in exploratory data analysis and descriptive statistics where the main objective is partitioning observations in groups. Clustering has a broad range of application in varied domains like climate, business, information retrieval, biology, psychology, to name a few. A variety of methods and algorithms have been developed for clustering tasks in the last few decades. We observe that most of these algorithms define a cluster in terms of value of the attributes, density, distance etc. However these definitions fail to attach a clear meaning/semantics to the generated clusters. We argue that clusters having understandable and distinct semantics defined in terms of quartiles/halves are more appealing to business analysts than the clusters defined by data boundaries or prototypes. On the samepremise, we propose our new algorithm named as quartile clustering technique. Through a series of experiments we establish efficacy of this algorithm. We demonstrate that the quartile clusteri...
ATAT@WIEN2k: An interface for cluster expansion based on the linearized augmented planewave method
Chakraborty, Monodeep; Spitaler, Jürgen; Puschnig, Peter; Ambrosch-Draxl, Claudia
2010-05-01
We have developed an interface between the all-electron density functional theory code WIEN2k, and the MIT Ab-initio Phase Stability (MAPS) code of the Alloy-Theoretic Automated Toolkit (ATAT). WIEN2k is an implementation of the full-potential linearized augmented planewave method which yields highly accurate total energies and optimized geometries for any given structure. The ATAT package consists of two parts. The first one is the MAPS code, which constructs a cluster expansion (CE) in conjunction with a first-principles code. These results form the basis for the second part, which computes the thermodynamic properties of the alloy. The main task of the CE is to calculate the many-body potentials or effective cluster interactions (ECIs) from the first-principles total energies of different structures or supercells using the structure-inversion technique. By linking MAPS seamlessly with WIEN2k we have created a tool to obtain the ECIs for any lattice type of an alloy. We have chosen fcc Al-Ti and bcc W-Re to evaluate our implementation. Our calculated ECIs exhibit all features of a converged CE and compare well with literature results.
A simulation study of three methods for detecting disease clusters
Directory of Open Access Journals (Sweden)
Samuelsen Sven O
2006-04-01
Full Text Available Abstract Background Cluster detection is an important part of spatial epidemiology because it can help identifying environmental factors associated with disease and thus guide investigation of the aetiology of diseases. In this article we study three methods suitable for detecting local spatial clusters: (1 a spatial scan statistic (SaTScan, (2 generalized additive models (GAM and (3 Bayesian disease mapping (BYM. We conducted a simulation study to compare the methods. Seven geographic clusters with different shapes were initially chosen as high-risk areas. Different scenarios for the magnitude of the relative risk of these areas as compared to the normal risk areas were considered. For each scenario the performance of the methods were assessed in terms of the sensitivity, specificity, and percentage correctly classified for each cluster. Results The performance depends on the relative risk, but all methods are in general suitable for identifying clusters with a relative risk larger than 1.5. However, it is difficult to detect clusters with lower relative risks. The GAM approach had the highest sensitivity, but relatively low specificity leading to an overestimation of the cluster area. Both the BYM and the SaTScan methods work well. Clusters with irregular shapes are more difficult to detect than more circular clusters. Conclusion Based on our simulations we conclude that the methods differ in their ability to detect spatial clusters. Different aspects should be considered for appropriate choice of method such as size and shape of the assumed spatial clusters and the relative importance of sensitivity and specificity. In general, the BYM method seems preferable for local cluster detection with relatively high relative risks whereas the SaTScan method appears preferable for lower relative risks. The GAM method needs to be tuned (using cross-validation to get satisfactory results.
Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects
Zhang, Y.; Horvath, S.; Ophoff, R; Telesca, D
2014-01-01
Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared $3$ probabilistic based clustering methods and $3$ distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model b...
Quantum Monte Carlo methods and lithium cluster properties. [Atomic clusters
Energy Technology Data Exchange (ETDEWEB)
Owen, R.K.
1990-12-01
Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) (0.1981), 0.1895(9) (0.1874(4)), 0.1530(34) (0.1599(73)), 0.1664(37) (0.1724(110)), 0.1613(43) (0.1675(110)) Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) (0.0203(12)), 0.0188(10) (0.0220(21)), 0.0247(8) (0.0310(12)), 0.0253(8) (0.0351(8)) Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.
Scalable Density-Based Subspace Clustering
DEFF Research Database (Denmark)
Müller, Emmanuel; Assent, Ira; Günnemann, Stephan;
2011-01-01
For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering...... synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....
Comparison between optical and X-ray cluster detection methods
Basilakos, S; Georgakakis, A; Georgantopoulos, I; Gaga, T; Kolokotronis, V G; Stewart, G C
2003-01-01
In this work we present combined optical and X-ray cluster detection methods in an area near the North Galactic Pole area, previously covered by the SDSS and 2dF optical surveys. The same area has been covered by shallow ($\\sim 1.8$ deg$^{2}$) XMM-{\\em Newton} observations. The optical cluster detection procedure is based on merging two independent selection methods - a smoothing+percolation technique, and a Matched Filter Algorithm. The X-ray cluster detection is based on a wavelet-based algorithm, incorporated in the SAS v.5.2 package. The final optical sample counts 9 candidate clusters with richness of more than 20 galaxies, corresponding roughly to APM richness class. Three, of our optically detected clusters are also detected in our X-ray survey.
Directory of Open Access Journals (Sweden)
Jinfei Liu
2013-04-01
Full Text Available DBSCAN is a well-known density-based clustering algorithm which offers advantages for finding clusters of arbitrary shapes compared to partitioning and hierarchical clustering methods. However, there are few papers studying the DBSCAN algorithm under the privacy preserving distributed data mining model, in which the data is distributed between two or more parties, and the parties cooperate to obtain the clustering results without revealing the data at the individual parties. In this paper, we address the problem of two-party privacy preserving DBSCAN clustering. We first propose two protocols for privacy preserving DBSCAN clustering over horizontally and vertically partitioned data respectively and then extend them to arbitrarily partitioned data. We also provide performance analysis and privacy proof of our solution..
Sequential Combination Methods forData Clustering Analysis
Institute of Scientific and Technical Information of China (English)
钱 涛; Ching Y.Suen; 唐远炎
2002-01-01
This paper proposes the use of more than one clustering method to improve clustering performance. Clustering is an optimization procedure based on a specific clustering criterion. Clustering combination can be regardedasatechnique that constructs and processes multiple clusteringcriteria.Sincetheglobalandlocalclusteringcriteriaarecomplementary rather than competitive, combining these two types of clustering criteria may enhance theclustering performance. In our past work, a multi-objective programming based simultaneous clustering combination algorithmhasbeenproposed, which incorporates multiple criteria into an objective function by a weighting method, and solves this problem with constrained nonlinear optimization programming. But this algorithm has high computationalcomplexity.Hereasequential combination approach is investigated, which first uses the global criterion based clustering to produce an initial result, then uses the local criterion based information to improve the initial result with aprobabilisticrelaxation algorithm or linear additive model.Compared with the simultaneous combination method, sequential combination haslow computational complexity. Results on some simulated data and standard test data arereported.Itappearsthatclustering performance improvement can be achieved at low cost through sequential combination.
Cluster beam sources. Part 1. Methods of cluster beams generation
Directory of Open Access Journals (Sweden)
A.Ju. Karpenko
2012-10-01
Full Text Available The short review on cluster beams generation is proposed. The basic types of cluster sources are considered and the processes leading to cluster formation are analyzed. The parameters, that affects the work of cluster sources are presented.
Cluster beam sources. Part 1. Methods of cluster beams generation
A.Ju. Karpenko; V.A. Baturin
2012-01-01
The short review on cluster beams generation is proposed. The basic types of cluster sources are considered and the processes leading to cluster formation are analyzed. The parameters, that affects the work of cluster sources are presented.
Spanning Tree Based Attribute Clustering
DEFF Research Database (Denmark)
Zeng, Yifeng; Jorge, Cordero Hernandez
2009-01-01
inconsistent edges from a maximum spanning tree by starting appropriate initial modes, therefore generating stable clusters. It discovers sound clusters through simple graph operations and achieves significant computational savings. We compare the Star Discovery algorithm against earlier attribute clustering...
ADVANCED CLUSTER BASED IMAGE SEGMENTATION
Directory of Open Access Journals (Sweden)
D. Kesavaraja
2011-11-01
Full Text Available This paper presents efficient and portable implementations of a useful image segmentation technique which makes use of the faster and a variant of the conventional connected components algorithm which we call parallel Components. In the Modern world majority of the doctors are need image segmentation as the service for various purposes and also they expect this system is run faster and secure. Usually Image segmentation Algorithms are not working faster. In spite of several ongoing researches in Conventional Segmentation and its Algorithms might not be able to run faster. So we propose a cluster computing environment for parallel image Segmentation to provide faster result. This paper is the real time implementation of Distributed Image Segmentation in Clustering of Nodes. We demonstrate the effectiveness and feasibility of our method on a set of Medical CT Scan Images. Our general framework is a single address space, distributed memory programming model. We use efficient techniques for distributing and coalescing data as well as efficient combinations of task and data parallelism. The image segmentation algorithm makes use of an efficient cluster process which uses a novel approach for parallel merging. Our experimental results are consistent with the theoretical analysis and practical results. It provides the faster execution time for segmentation, when compared with Conventional method. Our test data is different CT scan images from the Medical database. More efficient implementations of Image Segmentation will likely result in even faster execution times.
An alternative method to study star cluster disruption
Gieles, Mark
2008-01-01
Many embedded star clusters do not evolve into long-lived bound clusters. The most popular explanation for this "infant mortality" of young clusters is the expulsion of natal gas by stellar winds and supernovae, which leaves up to 90% of them unbound. A cluster disruption model has recently been proposed in which this mass- independent disruption of clusters proceeds for another Gyr after gas expulsion. In this scenario, the survival chances of massive clusters are much smaller than in the traditional mass-dependent disruption models. The most common way to study cluster disruption is to use the cluster age distribution, which, however, can be heavily affected by incompleteness. To avoid this, we introduce a new method, based on size-of-sample effects, namely the relation between the most massive cluster, M_max, and the age range sampled. Assuming that clusters are sampled from a power-law initial mass function, with index -2 and that the cluster formation rate is constant, M_max scales with the age range sam...
A PSO-Based Subtractive Data Clustering Algorithm
Directory of Open Access Journals (Sweden)
Gamal Abdel-Azeem
2013-03-01
Full Text Available There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO clustering algorithm can generate the most compact clustering results as compared to other algorithms.
Ontology Partitioning: Clustering Based Approach
Directory of Open Access Journals (Sweden)
Soraya Setti Ahmed
2015-05-01
Full Text Available The semantic web goal is to share and integrate data across different domains and organizations. The knowledge representations of semantic data are made possible by ontology. As the usage of semantic web increases, construction of the semantic web ontologies is also increased. Moreover, due to the monolithic nature of the ontology various semantic web operations like query answering, data sharing, data matching, data reuse and data integration become more complicated as the size of ontology increases. Partitioning the ontology is the key solution to handle this scalability issue. In this work, we propose a revision and an enhancement of K-means clustering algorithm based on a new semantic similarity measure for partitioning given ontology into high quality modules. The results show that our approach produces meaningful clusters than the traditional algorithm of K-means.
基于主动学习策略的半监督聚类算法研究%Semi-supervised clustering method based on active learning strategy
Institute of Scientific and Technical Information of China (English)
芦世丹; 崔荣一
2013-01-01
提出一种选择最富信息数据并予以标记的基于主动学习策略的半监督聚类算法.首先,采用传统K-均值聚类算法对数据集进行粗聚类；其次,根据粗聚类结果计算出每个数据隶属于每个类簇的隶属度,筛选出满足最大与次大隶属度差值小于阈值的候选数据,并从中选择差值较小的数据作为最富信息的数据进行标记；最后,将候选数据集合中未标记数据分组到与每类已被标记数据平均距离最小的类簇中.实验表明,提出的主动学习策略能够很好地学习到最富信息数据,基于该学习策略的半监督聚类算法在测试不同数据集时均获得了较高的准确率.%By employing active learning strategy to learn informative dataset to be labeled,this paper proposed a semi-supervised clustering method based on active learning strategy.Firstly,it employed traditional K-means algorithm to make coarse clustering for unlabeled dataset.And furthermore,based on the result of coarse clustering,it calculated the membership degree of each data belonging to each cluster,then screened out alternative data of which the difference between maximum and the second maximum membership degree was lower than threshold,then the partial data would be labeled if the difference of which was relatively small,i.e.,the data were informative samples.Finally,they grouped each selected unlabeled data to corresponding labeled cluster which acquired minimum average distances.The experimental results show that the proposed active learning strategy is very powerful to learn informative data,and the semi-supervised clustering method based on active learning strategy is quite accurate with regards to various dataset.
Coupled Cluster Methods in Lattice Gauge Theory
Watson, Nicholas Jay
Available from UMI in association with The British Library. Requires signed TDF. The many body coupled cluster method is applied to Hamiltonian pure lattice gauge theories. The vacuum wavefunction is written as the exponential of a single sum over the lattice of clusters of gauge invariant operators at fixed relative orientation and separation, generating excitations of the bare vacuum. The basic approximation scheme involves a truncation according to geometrical size on the lattice of the clusters in the wavefunction. For a wavefunction including clusters up to a given size, all larger clusters generated in the Schrodinger equation are discarded. The general formalism is first given, including that for excited states. Two possible procedures for discarding clusters are considered. The first involves discarding clusters describing excitations of the bare vacuum which are larger than those in the given wavefunction. The second involves rearranging the clusters so that they describe fluctuations of the gauge invariant excitations about their self-consistently calculated expectation values, and then discarding fluctuations larger then those in the given wavefunction. The coupled cluster method is applied to the Z_2 and Su(2) models in 2 + 1D. For the Z_2 model, the first procedure gives poor results, while the second gives wavefunctions which explicitly display a phase transition with critical couplings in good agreement with those obtained by other methods. For the SU(2) model, the first procedure also gives poor results, while the second gives vacuum wavefunctions valid at all couplings. The general properties of the wavefunctions at weak coupling are discussed. Approximations with clusters spanning up to four plaquettes are considered. Excited states are calculated, yielding mass gaps with fair scaling properties. Insight is obtained into the form of the wavefunctions at all couplings.
A Modified Ant-based Clustering for Medical Data
Directory of Open Access Journals (Sweden)
C. Immaculate Mary
2010-10-01
Full Text Available Ant-based techniques, in the computer sciences, are designed for those who take biological inspirations on the behavior of the social insects. Data-clustering techniques are classification algorithms that have a wide range of applications, from Biology to Image processing and Data presentation. The ant-based clustering technique has been proven a promising technique for the data clustering problems. In this paper a modified ant-based clustering is proposed for medical data processing. The performance of the proposed method is compared with k-means clustering.
Clustering Method in Data Mining%数据挖掘中的聚类方法
Institute of Scientific and Technical Information of China (English)
王实; 高文
2000-01-01
In this paper we introduce clustering method at Data Mining.Clustering has been studied very deeply.In the field of Data Mining,clustering is facing the new situation.We summarize the major clustering methods and introduce four kinds of clustering method that have been used broadly in Data Mitring.Finally we draw a conclusion that the partitional clustering method based on distance in data mining is a typical two phase iteration process:1)appoint cluster;2)update the center of cluster.
PERFORMANCE OF SELECTED AGGLOMERATIVE HIERARCHICAL CLUSTERING METHODS
Directory of Open Access Journals (Sweden)
Nusa Erman
2015-01-01
Full Text Available A broad variety of different methods of agglomerative hierarchical clustering brings along problems how to choose the most appropriate method for the given data. It is well known that some methods outperform others if the analysed data have a specific structure. In the presented study we have observed the behaviour of the centroid, the median (Gower median method, and the average method (unweighted pair-group method with arithmetic mean – UPGMA; average linkage between groups. We have compared them with mostly used methods of hierarchical clustering: the minimum (single linkage clustering, the maximum (complete linkage clustering, the Ward, and the McQuitty (groups method average, weighted pair-group method using arithmetic averages - WPGMA methods. We have applied the comparison of these methods on spherical, ellipsoid, umbrella-like, “core-and-sphere”, ring-like and intertwined three-dimensional data structures. To generate the data and execute the analysis, we have used R statistical software. Results show that all seven methods are successful in finding compact, ball-shaped or ellipsoid structures when they are enough separated. Conversely, all methods except the minimum perform poor on non-homogenous, irregular and elongated ones. Especially challenging is a circular double helix structure; it is being correctly revealed only by the minimum method. We can also confirm formerly published results of other simulation studies, which usually favour average method (besides Ward method in cases when data is assumed to be fairly compact and well separated.
A clustering method based on Dirichlet process mixture model%Dirichlet过程混合模型的聚类算法
Institute of Scientific and Technical Information of China (English)
张林; 刘辉
2012-01-01
The number of clusters should be determined in advance when a finite mixture model is built to cluster high dimensional data, which deteriorates the precision and generalization of clustering. A Dirichlet process infinite mixture model was built to cluster high dimensional data in this paper. Based on Urn model, the posterior distributions of each parameter were derived. All parameters, including the number of potential clusters were estimated through Gibbs sam- pling MCMC method. The clustering results on both simulation dataset and IRIS dataset show that this method can correctly estimate the number of potential clusters after 200 Gibbs sampling MCMC iterations. The average time of iteration for simulation and IRIS datasets were 0. 1850 s and 0. 1455 s, respectively, and the time complexity of each iteration was O（N）, where N is the number of sample.%有限混合模型进行高维数据聚类分析时需预先估计聚类个数，因而聚类的准确性和泛化性受到影响。通过建立Dirichlet过程无限混合模型对高维数据开展聚类分析，采用Dirichlet过程的Urn模型分析出模型中各参数的后验分布，利用Gibbs采样MCMC方法估计出模型中各参数及数据中潜在的聚类数。在五维的仿真数据集和IRIS测试数据集上的聚类结果表明：经过200次Gibbs采样MCMC过程，该算法能够正确地估计出数据中潜在的聚类数。单次Gibbs采样MCMC过程的平均占用时间分别为0.1850S和0.1455S，其时间复杂度和数据的样本个数N有关，为0（N）。
Cosmological Constraints with Clustering-Based Redshifts
Kovetz, Ely D; Rahman, Mubdi
2016-01-01
We demonstrate that observations lacking reliable redshift information, such as photometric and radio continuum surveys, can produce robust measurements of cosmological parameters when empowered by clustering-based redshift estimation. This method infers the redshift distribution based on the spatial clustering of sources, using cross-correlation with a reference dataset with known redshifts. Applying this method to the existing SDSS photometric galaxies, and projecting to future radio continuum surveys, we show that sources can be efficiently divided into several redshift bins, increasing their ability to constrain cosmological parameters. We forecast constraints on the dark-energy equation-of-state and on local non-gaussianity parameters. We explore several pertinent issues, including the tradeoff between including more sources versus minimizing the overlap between bins, the shot-noise limitations on binning, and the predicted performance of the method at high redshifts. Remarkably, we find that, once this ...
基于用户过滤的校园无线网用户聚类方法%User filtering based campus WLAN user clustering method
Institute of Scientific and Technical Information of China (English)
仇一泓; 尧婷娟; 秦丰林; 葛连升
2014-01-01
With the widespread of smart terminals such as smart phones and smart pads, using MAC address as user iden-tification in campus wireless local area network (WLAN) user clustering research cannot exactly represent user behavior. An user filtering based user clustering is proposed. This method filters users’ behavior data by their degree of activeness, and then further conducts clustering analysis of campus WLAN user behavior. The experimental result verifies the effec-tiveness of the proposed method.%随着智能终端地普及，在校园无线网用户聚类研究中采用MAC地址作为用户区分已不能真实反映用户的行为，为此，提出了一个基于用户过滤的校园无线网用户聚类方法，该方法基于用户活跃度对用户行为数据进行过滤，在此基础上对校园无线网用户行为做进一步地聚类分析。实验结果表明了该方法的有效性。
Directory of Open Access Journals (Sweden)
Galway LP
2012-04-01
Full Text Available Abstract Background Mortality estimates can measure and monitor the impacts of conflict on a population, guide humanitarian efforts, and help to better understand the public health impacts of conflict. Vital statistics registration and surveillance systems are rarely functional in conflict settings, posing a challenge of estimating mortality using retrospective population-based surveys. Results We present a two-stage cluster sampling method for application in population-based mortality surveys. The sampling method utilizes gridded population data and a geographic information system (GIS to select clusters in the first sampling stage and Google Earth TM imagery and sampling grids to select households in the second sampling stage. The sampling method is implemented in a household mortality study in Iraq in 2011. Factors affecting feasibility and methodological quality are described. Conclusion Sampling is a challenge in retrospective population-based mortality studies and alternatives that improve on the conventional approaches are needed. The sampling strategy presented here was designed to generate a representative sample of the Iraqi population while reducing the potential for bias and considering the context specific challenges of the study setting. This sampling strategy, or variations on it, are adaptable and should be considered and tested in other conflict settings.
Variable cluster analysis method for building neural network model
Institute of Scientific and Technical Information of China (English)
王海东; 刘元东
2004-01-01
To address the problems that input variables should be reduced as much as possible and explain output variables fully in building neural network model of complicated system, a variable selection method based on cluster analysis was investigated. Similarity coefficient which describes the mutual relation of variables was defined. The methods of the highest contribution rate, part replacing whole and variable replacement are put forwarded and deduced by information theory. The software of the neural network based on cluster analysis, which can provide many kinds of methods for defining variable similarity coefficient, clustering system variable and evaluating variable cluster, was developed and applied to build neural network forecast model of cement clinker quality. The results show that all the network scale, training time and prediction accuracy are perfect. The practical application demonstrates that the method of selecting variables for neural network is feasible and effective.
Progressive Exponential Clustering-Based Steganography
Directory of Open Access Journals (Sweden)
Li Yue
2010-01-01
Full Text Available Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC, is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.
Energy Technology Data Exchange (ETDEWEB)
Riplinger, Christoph; Pinski, Peter; Becker, Ute; Neese, Frank, E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Max Planck Institute for Chemical Energy Conversion, Stiftstr. 34-36, D-45470 Mülheim an der Ruhr (Germany); Valeev, Edward F., E-mail: frank.neese@cec.mpg.de, E-mail: evaleev@vt.edu [Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061 (United States)
2016-01-14
Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate
International Nuclear Information System (INIS)
Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate
Riplinger, Christoph; Pinski, Peter; Becker, Ute; Valeev, Edward F; Neese, Frank
2016-01-14
Domain based local pair natural orbital coupled cluster theory with single-, double-, and perturbative triple excitations (DLPNO-CCSD(T)) is a highly efficient local correlation method. It is known to be accurate and robust and can be used in a black box fashion in order to obtain coupled cluster quality total energies for large molecules with several hundred atoms. While previous implementations showed near linear scaling up to a few hundred atoms, several nonlinear scaling steps limited the applicability of the method for very large systems. In this work, these limitations are overcome and a linear scaling DLPNO-CCSD(T) method for closed shell systems is reported. The new implementation is based on the concept of sparse maps that was introduced in Part I of this series [P. Pinski, C. Riplinger, E. F. Valeev, and F. Neese, J. Chem. Phys. 143, 034108 (2015)]. Using the sparse map infrastructure, all essential computational steps (integral transformation and storage, initial guess, pair natural orbital construction, amplitude iterations, triples correction) are achieved in a linear scaling fashion. In addition, a number of additional algorithmic improvements are reported that lead to significant speedups of the method. The new, linear-scaling DLPNO-CCSD(T) implementation typically is 7 times faster than the previous implementation and consumes 4 times less disk space for large three-dimensional systems. For linear systems, the performance gains and memory savings are substantially larger. Calculations with more than 20 000 basis functions and 1000 atoms are reported in this work. In all cases, the time required for the coupled cluster step is comparable to or lower than for the preceding Hartree-Fock calculation, even if this is carried out with the efficient resolution-of-the-identity and chain-of-spheres approximations. The new implementation even reduces the error in absolute correlation energies by about a factor of two, compared to the already accurate
International Nuclear Information System (INIS)
Graphical abstract: The structure of a minimum in Ar19K+ cluster. Abstract: In this paper we explore the possibility of using stochastic optimizers, namely simulated annealing (SA) in locating critical points (global minima, local minima and first order saddle points) in Argon noble gas clusters perturbed by alkali metal ions namely sodium and potassium. The atomic interaction potential is the Lennard Jones potential. We also try to see if a continuous transformation in geometry during the search process can lead to a realization of a kind of minimum energy path (MEP) for transformation from one minimum geometry to another through a transition state (first order saddle point). We try our recipe for three sizes of clusters, namely (Ar)16M+, (Ar)19M+ and (Ar)24M+, where M+ is Na+ and K+.
A statistical method to determine open cluster metallicities
Poehnl, Harald
2010-01-01
The study of open cluster metallicities helps to understand the local stellar formation and evolution throughout the Milky Way. Its metallicity gradient is an important tracer for the Galactic formation in a global sense. Because open clusters can be treated in a statistical way, the error of the cluster mean is minimized. Our final goal is a semi-automatic statistical robust method to estimate the metallicity of a statistically significant number of open clusters based on Johnson BV data of their members, an algorithm that can easily be extended to other photometric systems for a systematic investigation. This method incorporates evolutionary grids for different metallicities and a calibration of the effective temperature and luminosity. With cluster parameters (age, reddening and distance) it is possible to estimate the metallicity from a statistical point of view. The iterative process includes an intrinsic consistency check of the starting input parameters and allows us to modify them. We extensively test...
Initialization independent clustering with actively self-training method.
Nie, Feiping; Xu, Dong; Li, Xuelong
2012-02-01
The results of traditional clustering methods are usually unreliable as there is not any guidance from the data labels, while the class labels can be predicted more reliable by the semisupervised learning if the labels of partial data are given. In this paper, we propose an actively self-training clustering method, in which the samples are actively selected as training set to minimize an estimated Bayes error, and then explore semisupervised learning to perform clustering. Traditional graph-based semisupervised learning methods are not convenient to estimate the Bayes error; we develop a specific regularization framework on graph to perform semisupervised learning, in which the Bayes error can be effectively estimated. In addition, the proposed clustering algorithm can be readily applied in a semisupervised setting with partial class labels. Experimental results on toy data and real-world data sets demonstrate the effectiveness of the proposed clustering method on the unsupervised and the semisupervised setting. It is worthy noting that the proposed clustering method is free of initialization, while traditional clustering methods are usually dependent on initialization. PMID:22086542
DNA splice site sequences clustering method for conservativeness analysis
Institute of Scientific and Technical Information of China (English)
Quanwei Zhang; Qinke Peng; Tao Xu
2009-01-01
DNA sequences that are near to splice sites have remarkable conservativeness,and many researchers have contributed to the prediction of splice site.In order to mine the underlying biological knowledge,we analyze the conservativeness of DNA splice site adjacent sequences by clustering.Firstly,we propose a kind of DNA splice site sequences clustering method which is based on DBSCAN,and use four kinds of dissimilarity calculating methods.Then,we analyze the conservative feature of the clustering results and the experimental data set.
Model-Based Clustering of Large Networks
Vu, Duy Quang; Schweinberger, Michael
2012-01-01
We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger datasets than those seen elsewhere in the literature. The more flexible modeling framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms, which we show how to adapt to the more complicated optimization requirements introduced by the constraints imposed by the novel parameterizations we propose, are based on variational generalized EM algorithms...
Incremental Web Usage Mining Based on Active Ant Colony Clustering
Institute of Scientific and Technical Information of China (English)
SHEN Jie; LIN Ying; CHEN Zhimin
2006-01-01
To alleviate the scalability problem caused by the increasing Web using and changing users' interests, this paper presents a novel Web Usage Mining algorithm-Incremental Web Usage Mining algorithm based on Active Ant Colony Clustering. Firstly, an active movement strategy about direction selection and speed, different with the positive strategy employed by other Ant Colony Clustering algorithms, is proposed to construct an Active Ant Colony Clustering algorithm, which avoid the idle and "flying over the plane" moving phenomenon, effectively improve the quality and speed of clustering on large dataset. Then a mechanism of decomposing clusters based on above methods is introduced to form new clusters when users' interests change. Empirical studies on a real Web dataset show the active ant colony clustering algorithm has better performance than the previous algorithms, and the incremental approach based on the proposed mechanism can efficiently implement incremental Web usage mining.
Orbit Clustering Based on Transfer Cost
Gustafson, Eric D.; Arrieta-Camacho, Juan J.; Petropoulos, Anastassios E.
2013-01-01
We propose using cluster analysis to perform quick screening for combinatorial global optimization problems. The key missing component currently preventing cluster analysis from use in this context is the lack of a useable metric function that defines the cost to transfer between two orbits. We study several proposed metrics and clustering algorithms, including k-means and the expectation maximization algorithm. We also show that proven heuristic methods such as the Q-law can be modified to work with cluster analysis.
Directory of Open Access Journals (Sweden)
Li Ma
2015-01-01
Full Text Available Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA. The proposed algorithm combines artificial fish swarm algorithm (AFSA with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM.
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Ma, Li; Li, Yang; Fan, Suohai; Fan, Runzhu
2015-01-01
Image segmentation plays an important role in medical image processing. Fuzzy c-means (FCM) clustering is one of the popular clustering algorithms for medical image segmentation. However, FCM has the problems of depending on initial clustering centers, falling into local optimal solution easily, and sensitivity to noise disturbance. To solve these problems, this paper proposes a hybrid artificial fish swarm algorithm (HAFSA). The proposed algorithm combines artificial fish swarm algorithm (AFSA) with FCM whose advantages of global optimization searching and parallel computing ability of AFSA are utilized to find a superior result. Meanwhile, Metropolis criterion and noise reduction mechanism are introduced to AFSA for enhancing the convergence rate and antinoise ability. The artificial grid graph and Magnetic Resonance Imaging (MRI) are used in the experiments, and the experimental results show that the proposed algorithm has stronger antinoise ability and higher precision. A number of evaluation indicators also demonstrate that the effect of HAFSA is more excellent than FCM and suppressed FCM (SFCM). PMID:26649068
Comparing the performance of biomedical clustering methods.
Wiwie, Christian; Baumbach, Jan; Röttger, Richard
2015-11-01
Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene expression to protein domains. Performance was judged on the basis of 13 common cluster validity indices. We developed a clustering analysis platform, ClustEval (http://clusteval.mpi-inf.mpg.de), to promote streamlined evaluation, comparison and reproducibility of clustering results in the future. This allowed us to objectively evaluate the performance of all tools on all data sets with up to 1,000 different parameter sets each, resulting in a total of more than 4 million calculated cluster validity indices. We observed that there was no universal best performer, but on the basis of this wide-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art. PMID:26389570
CCM: A Text Classification Method by Clustering
DEFF Research Database (Denmark)
Nizamani, Sarwat; Memon, Nasrullah; Wiil, Uffe Kock;
2011-01-01
In this paper, a new Cluster based Classification Model (CCM) for suspicious email detection and other text classification tasks, is presented. Comparative experiments of the proposed model against traditional classification models and the boosting algorithm are also discussed. Experimental results...... show that the CCM outperforms traditional classification models as well as the boosting algorithm for the task of suspicious email detection on terrorism domain email dataset and topic categorization on the Reuters-21578 and 20 Newsgroups datasets. The overall finding is that applying a cluster based...... approach to text classification tasks simplifies the model and at the same time increases the accuracy....
Performance Analysis of Unsupervised Clustering Methods for Brain Tumor Segmentation
Directory of Open Access Journals (Sweden)
Tushar H Jaware
2013-10-01
Full Text Available Medical image processing is the most challenging and emerging field of neuroscience. The ultimate goal of medical image analysis in brain MRI is to extract important clinical features that would improve methods of diagnosis & treatment of disease. This paper focuses on methods to detect & extract brain tumour from brain MR images. MATLAB is used to design, software tool for locating brain tumor, based on unsupervised clustering methods. K-Means clustering algorithm is implemented & tested on data base of 30 images. Performance evolution of unsupervised clusteringmethods is presented.
Institute of Scientific and Technical Information of China (English)
程宏斌; 乐德广; 孙霞; 王海军
2012-01-01
The paper established an energy consumption model in order to improve the lower energy efficiency of nodes in LEACH protocol. A method of cluster—head rotation based on non — competitive mode was proposed, according to analysis result of the energy consumption difference value between the the different nodes and elected cluster—head. The means elected the cluster—head once only at the first round of each rotation cycle. Then the other nodes acted as a cluster —head by fixed rotary method in remaining round. Furthermore, reasonable collection times of data in each round also could effectively reduce the energy consumption of cluster—head election. Finally, the theoretical analysis and simulation results show that the power consumption performance of WSNs clustering was improved effective by this optimized clustering algorithm.%针对LEACH协议中节点网络能量效率低的问题,建立了分簇协议的能耗模型；基于对簇首竞选能耗和不同节点能耗差的分析,提出了一种基于非竞争式的WSNs簇首轮换方法:在每一个轮转周期的第一轮中竞选一次簇首,其余轮中采取固定轮转的方法依次让其它节点充当簇首；同时合理设置每轮中的数据收集次数,以便有效降低网络簇首竞选能耗；理论分析和仿真实验表明:改进的分簇算法能够有效地改善WSNs分簇协议的总能耗性能.
Generating a multilingual taxonomy based on multilingual terminology clustering
Institute of Scientific and Technical Information of China (English)
Chengzhi; ZHANG
2011-01-01
Taxonomy denotes the hierarchical structure of a knowledge organization system.It has important applications in knowledge navigation,semantic annotation and semantic search.It is a useful instrument to study the multilingual taxonomy generated automatically under the dynamic information environment in which massive amounts of information are processed and found.Multilingual taxonomy is the core component of the multilingual thesaurus or ontology.This paper presents two methods of bilingual generated taxonomy:Cross-language terminology clustering and mixed-language based terminology clustering.According to our experimental results of terminology clustering related to four specific subject domains,we found that if the parallel corpus is used to cluster multilingual terminologies,the method of using mixed-language based terminology clustering outperforms that of using the cross-language terminology clustering.
Study on Grey Clustering Decision Methods that Based on Reny Entropy%基于Reny熵的灰色聚类决策方法研究
Institute of Scientific and Technical Information of China (English)
吴正朋; 张友萍; 李梅
2011-01-01
On account of the weight of traditional grey fixed weight clustering methords which is given in advance and does not have objective problems,the passage proves out a method of decicling weight that based on Reny entropy,owing to the thinking of traditional Shannon entropy of information,and construct methods that based on Reny entropy.The algorithem makes use of system state data,throughing calculating entropy to have decision weight,and makes example stheric syndrome research on the background of practical problem.The result proves that the method is easy in calculating and the weight decision is objective,and also complement and perfect grey clustering decision theory.%针对传统灰色定权聚类方法中权重是事先给定的,不具有客观性的问题。借鉴传统的shannon信息熵的思想,本文提出了基于Reny熵权确定权重的方法。构造了基于构造了基于Reny熵权的灰色定权聚类评估方法的算法。该方法利用系统状态数据为依据,通过计算熵来得到决策权重,以实际问题为背景进行了算例实证研究。结果表明该方法计算简单,权重确定客观,对灰色聚类决策理论进行了补充和完善。
FLCW: Frequent Itemset Based Text Clustering with Window Constraint
Institute of Scientific and Technical Information of China (English)
ZHOU Chong; LU Yansheng; ZOU Lei; HU Rong
2006-01-01
Most of the existing text clustering algorithms overlook the fact that one document is a word sequence with semantic information.There is some important semantic information existed in the positions of words in the sequence.In this paper, a novel method named Frequent Itemset-based Clustering with Window (FICW) was proposed, which makes use of the semantic information for text clustering with a window constraint.The experimental results obtained from tests on three (hypertext) text sets show that FICW outperforms the method compared in both clustering accuracy and efficiency.
New clustering methods for population comparison on paternal lineages.
Juhász, Z; Fehér, T; Bárány, G; Zalán, A; Németh, E; Pádár, Z; Pamjav, H
2015-04-01
The goal of this study is to show two new clustering and visualising techniques developed to find the most typical clusters of 18-dimensional Y chromosomal haplogroup frequency distributions of 90 Western Eurasian populations. The first technique called "self-organizing cloud (SOC)" is a vector-based self-learning method derived from the Self Organising Map and non-metric Multidimensional Scaling algorithms. The second technique is a new probabilistic method called the "maximal relation probability" (MRP) algorithm, based on a probability function having its local maximal values just in the condensation centres of the input data. This function is calculated immediately from the distance matrix of the data and can be interpreted as the probability that a given element of the database has a real genetic relation with at least one of the remaining elements. We tested these two new methods by comparing their results to both each other and the k-medoids algorithm. By means of these new algorithms, we determined 10 clusters of populations based on the similarity of haplogroup composition. The results obtained represented a genetically, geographically and historically well-interpretable picture of 10 genetic clusters of populations mirroring the early spread of populations from the Fertile Crescent to the Caucasus, Central Asia, Arabia and Southeast Europe. The results show that a parallel clustering of populations using SOC and MRP methods can be an efficient tool for studying the demographic history of populations sharing common genetic footprints. PMID:25388803
Web Document Clustering Using Cuckoo Search Clustering Algorithm based on Levy Flight
Directory of Open Access Journals (Sweden)
Moe Moe Zaw
2013-09-01
Full Text Available The World Wide Web serves as a huge widely distributed global information service center. The tremendous amount of information on the web is improving day by day. So, the process of finding the relevant information on the web is a major challenge in Information Retrieval. This leads the need for the development of new techniques for helping users to effectively navigate, summarize and organize the overwhelmed information. One of the techniques that can play an important role towards the achievement of this objective is web document clustering. This paper aims to develop a clustering algorithm and apply in web document clustering area. The Cuckoo Search Optimization algorithm is a recently developed optimization algorithm based on the obligate behavior of some cuckoo species in combining with the levy flight. In this paper, Cuckoo Search Clustering Algorithm based on levy flight is proposed. This algorithm is the application of Cuckoo Search Optimization algorithm in web document clustering area to locate the optimal centroids of the cluster and to find global solution of the clustering algorithm. For testing the performance of the proposed method, this paper will show the experience result by using the benchmark dataset. The result obtained shows that the Cuckoo Search Clustering algorithm based on Levy Flight performs well in web document clustering.
Wang, Tai-Chi; Phoa, Frederick Kin Hing
2016-03-01
Community/cluster is one of the most important features in social networks. Many cluster detection methods were proposed to identify such an important pattern, but few were able to identify the statistical significance of the clusters by considering the likelihood of network structure and its attributes. Based on the definition of clustering, we propose a scanning method, originated from analyzing spatial data, for identifying clusters in social networks. Since the properties of network data are more complicated than those of spatial data, we verify our method's feasibility via simulation studies. The results show that the detection powers are affected by cluster sizes and connection probabilities. According to our simulation results, the detection accuracy of structure clusters and both structure and attribute clusters detected by our proposed method is better than that of other methods in most of our simulation cases. In addition, we apply our proposed method to some empirical data to identify statistically significant clusters.
A New Method of Open Cluster Membership Determination
Gao, Xin-hua; Chen, Li; Hou, Zhen-jie
2014-07-01
Membership determination is the key-important step to study open clusters, which can directly influence on the estimation of open clusters’ physical parameters. DBSCAN (Density Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm in data mining techniques. In this paper the DBSCAN algorithm has been used for the first time to make the membership determination of the open clusters NGC 6791 and M 67 (NGC 2682). Our results indicate that the DBSCAN algorithm can effectively eliminate the contamination of field stars. The obtained member stars of NGC 6791 exhibit clearly a doubled main-sequence structure in the color-magnitude diagram, implying that NGC 6791 may have a more complicated history of star formation and evolution. The clustering analysis of M67 indicates the presence of mass segregation, and the distinct relative motion between the central part and the outer part of the cluster. These results demonstrate that the DBSCAN algorithm is an effective method of membership determination, and that it has some advantages superior to the conventional kinematic method.
Comparing the performance of biomedical clustering methods
DEFF Research Database (Denmark)
Wiwie, Christian; Baumbach, Jan; Röttger, Richard
2015-01-01
Identifying groups of similar objects is a popular first step in biomedical data analysis, but it is error-prone and impossible to perform manually. Many computational methods have been developed to tackle this problem. Here we assessed 13 well-known methods using 24 data sets ranging from gene......-ranging comparison we were able to develop a short guideline for biomedical clustering tasks. ClustEval allows biomedical researchers to pick the appropriate tool for their data type and allows method developers to compare their tool to the state of the art....
An Empirical Comparison of the Summarization Power of Graph Clustering Methods
Liu, Yike; Shah, Neil; Koutra, Danai
2015-01-01
How do graph clustering techniques compare with respect to their summarization power? How well can they summarize a million-node graph with a few representative structures? Graph clustering or community detection algorithms can summarize a graph in terms of coherent and tightly connected clusters. In this paper, we compare and contrast different techniques: METIS, Louvain, spectral clustering, SlashBurn and KCBC, our proposed k-core-based clustering method. Unlike prior work that focuses on v...
A survey of kernel and spectral methods for clustering
Filippone, M.; Camastra, F.; Masulli, F.; Rovetta, S.
2008-01-01
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many c...
Structure based alignment and clustering of proteins (STRALCP)
Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.
2013-06-18
Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.
A Cluster Based Approach for Classification of Web Results
Directory of Open Access Journals (Sweden)
Apeksha Khabia
2014-12-01
Full Text Available Nowadays significant amount of information from web is present in the form of text, e.g., reviews, forum postings, blogs, news articles, email messages, web pages. It becomes difficult to classify documents in predefined categories as the number of document grows. Clustering is the classification of a data into clusters, so that the data in each cluster share some common trait – often vicinity according to some defined measure. Underlying distribution of data set can somewhat be depicted based on the learned clusters under the guidance of initial data set. Thus, clusters of documents can be employed to train the classifier by using defined features of those clusters. One of the important issues is also to classify the text data from web into different clusters by mining the knowledge. Conforming to that, this paper presents a review on most of document clustering technique and cluster based classification techniques used so far. Also pre-processing on text dataset and document clustering method is explained in brief.
Open cluster membership probability based on K-means clustering algorithm
El Aziz, Mohamed Abd; Selim, I. M.; Essam, A.
2016-05-01
In the field of galaxies images, the relative coordinate positions of each star with respect to all the other stars are adapted. Therefore the membership of star cluster will be adapted by two basic criterions, one for geometric membership and other for physical (photometric) membership. So in this paper, we presented a new method for the determination of open cluster membership based on K-means clustering algorithm. This algorithm allows us to efficiently discriminate the cluster membership from the field stars. To validate the method we applied it on NGC 188 and NGC 2266, membership stars in these clusters have been obtained. The color-magnitude diagram of the membership stars is significantly clearer and shows a well-defined main sequence and a red giant branch in NGC 188, which allows us to better constrain the cluster members and estimate their physical parameters. The membership probabilities have been calculated and compared to those obtained by the other methods. The results show that the K-means clustering algorithm can effectively select probable member stars in space without any assumption about the spatial distribution of stars in cluster or field. The similarity of our results is in a good agreement with results derived by previous works.
Eros-based Fuzzy Cluster Method for Longitudual Data%基于Eros距离的纵向数据模糊聚类方法
Institute of Scientific and Technical Information of China (English)
李会民; 闫健卓; 方丽英; 王普
2013-01-01
Considering the characteristics of longitudinal data set,such as multi-variates,missing data,unequal series length,and irregular time interval,an algorithm based on Eros distance similarity measure for longitudinal data is proposed.Eros distance is used in Fuzzy-C-Means cluster processing.First,preprocessing is done for unbalance longitudinal data set,which includes filling the missing data,reducing the randaut attributes,etc.Second,FErosCM Cluster method is used for claasification automatically,and takes into account information entropy for assessing the performance of cluster algorithm.Experiments show that this method is effective and efficient for longitudinal data classification.%针对纵向数据集的数据特征,如多维、含缺失值、序列不等间隔和不全等长等特点,研究一种基于Eros距离的纵向数据的相似性度量方法,并对模糊C均值聚类算法进行改进,提出一种基于Eros距离度量的模糊聚类数据处理方法.对于纵向数据集,首先进行缺失值填充、变量标准化等预处理,使用粗糙集理论对冗余属性进行约简,然后基于FErosCM聚类方法进行数据自动分类.对比实验证实此方法可用于纵向数据集的自动聚类处理,并使用信息熵作为聚类效果的评价手段.实验结果表明:无论在聚类效率还是准确度上,FErosCM方法对于纵向数据的分类处理均是有效可行的.
基于直觉模糊聚类的Web资源推荐方法%Web resource recommendation method based on intuitive fuzzy clustering
Institute of Scientific and Technical Information of China (English)
肖满生; 汪新凡; 周丽娟
2012-01-01
在Web资源分类中,针对传统基于用户兴趣的方法不能准确反映用户兴趣的变化以及难以区分资源内容的品质和风格等问题,提出一种基于直觉模糊C均值聚类的Web资源聚类推荐方法.该方法首先根据用户兴趣度将Web资源表示为直觉模糊数,然后应用直觉模糊信息集成理论进行资源分类,最后实现向用户推荐相似或相近资源.理论分析和实验表明,该方法比传统的模糊C均值以及协同过滤方法在推荐质量上有很大的提高.%In the classification of the Web resources, a recommending method of Web resources based on intuitive fuzzy C-means clustering was proposed to solve the problem that the traditional method based on user interest cannot reflect the change of their interests accurately and the difficulty in distinguishing the quality and the style of content of resources. In the method, firstly, the Web resources were expressed as intuitive fuzzy data according to the user interest degree. Then the integrated theory of intuitive fuzzy information was applied to classify the resources. Lastly, the similar resources would be recommended to user successfully. Theoretical analysis and experimental results show that this method has a great advantage in improving the quality of recommendation compared with traditional fuzzy C-means and collaborative filtering method.
Recent advances in coupled-cluster methods
Bartlett, Rodney J
1997-01-01
Today, coupled-cluster (CC) theory has emerged as the most accurate, widely applicable approach for the correlation problem in molecules. Furthermore, the correct scaling of the energy and wavefunction with size (i.e. extensivity) recommends it for studies of polymers and crystals as well as molecules. CC methods have also paid dividends for nuclei, and for certain strongly correlated systems of interest in field theory.In order for CC methods to have achieved this distinction, it has been necessary to formulate new, theoretical approaches for the treatment of a variety of essential quantities
SOFT CLUSTERING BASED EXPOSITION TO MULTIPLE DICTIONARY BAG OF WORDS
Directory of Open Access Journals (Sweden)
K. S. Sujatha
2012-01-01
Full Text Available Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more. A new common method of classification that uses features is the Bag of Words approach. In this method a codebook of visual words is created using various clustering methods. For increasing the performance Multiple Dictionaries BoW (MDBoW method that uses more visual words from different independent dictionaries instead of adding more words to the same dictionary was implemented using hard clustering method. Nearest-neighbor assignments are used in hard clustering of features. A given feature may be nearly the same distance from two cluster centers. For a typical hard clustering method, only the slightly nearer neighbor is selected to represent that feature. Thus, the ambiguous features are not well-represented by the visual vocabulary. To address this problem, soft clustering model based Multiple Dictionary Bag of Visual words for image classification is implemented with dictionary generated using modified Fuzzy C-means algorithm using R1 norm. A performance evaluation on images has been done by varying the dictionary size. The proposed method works better when the number of topics and the number of images per topics are more. The results obtained indicate that multiple dictionary bag of words model using fuzzy clustering increases the recognition performance than the baseline method.
Zhuang, X. W.; Li, Y. P.; Huang, G. H.; Liu, J.
2016-07-01
An integrated multi-GCM-based stochastic weather generator and stepwise cluster analysis (MGCM-SWG-SCA) method is developed, through incorporating multiple global climate models (MGCM), stochastic weather generator (SWG), and stepwise-clustered hydrological model (SCHM) within a general framework. MGCM-SWG-SCA can investigate uncertainties of projected climate changes as well as create watershed-scale climate projections from large-scale variables. It can also assess climate change impacts on hydrological processes and capture nonlinear relationship between input variables and outputs in watershed systems. MGCM-SWG-SCA is then applied to the Kaidu watershed with cold-arid characteristics in the Xinjiang Uyghur Autonomous Region of northwest China, for demonstrating its efficiency. Results reveal that the variability of streamflow is mainly affected by (1) temperature change during spring, (2) precipitation change during winter, and (3) both temperature and precipitation changes in summer and autumn. Results also disclose that: (1) the projected minimum and maximum temperatures and precipitation from MGCM change with seasons in different ways; (2) various climate change projections can reproduce the seasonal variability of watershed-scale climate series; (3) SCHM can simulate daily streamflow with a satisfactory degree, and a significant increasing trend of streamflow is indicated from future (2015-2035) to validation (2006-2011) periods; (4) the streamflow can vary under different climate change projections. The findings can be explained that, for the Kaidu watershed located in the cold-arid region, glacier melt is mainly related to temperature changes and precipitation changes can directly cause the variability of streamflow.
Zhuang, X. W.; Li, Y. P.; Huang, G. H.; Liu, J.
2015-12-01
An integrated multi-GCM-based stochastic weather generator and stepwise cluster analysis (MGCM-SWG-SCA) method is developed, through incorporating multiple global climate models (MGCM), stochastic weather generator (SWG), and stepwise-clustered hydrological model (SCHM) within a general framework. MGCM-SWG-SCA can investigate uncertainties of projected climate changes as well as create watershed-scale climate projections from large-scale variables. It can also assess climate change impacts on hydrological processes and capture nonlinear relationship between input variables and outputs in watershed systems. MGCM-SWG-SCA is then applied to the Kaidu watershed with cold-arid characteristics in the Xinjiang Uyghur Autonomous Region of northwest China, for demonstrating its efficiency. Results reveal that the variability of streamflow is mainly affected by (1) temperature change during spring, (2) precipitation change during winter, and (3) both temperature and precipitation changes in summer and autumn. Results also disclose that: (1) the projected minimum and maximum temperatures and precipitation from MGCM change with seasons in different ways; (2) various climate change projections can reproduce the seasonal variability of watershed-scale climate series; (3) SCHM can simulate daily streamflow with a satisfactory degree, and a significant increasing trend of streamflow is indicated from future (2015-2035) to validation (2006-2011) periods; (4) the streamflow can vary under different climate change projections. The findings can be explained that, for the Kaidu watershed located in the cold-arid region, glacier melt is mainly related to temperature changes and precipitation changes can directly cause the variability of streamflow.
Document Clustering based on Topic Maps
Rafi, Muhammad; Farooq, Amir; 10.5120/1640-2204
2011-01-01
Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next challenge lies in semantically performing clustering based on the semantic contents of the document. The problem of document clustering has two main components: (1) to represent the document in such a form that inherently captures semantics of the text. This may also help to reduce dimensionality of the document, and (2) to define a similarity measure based on the semantic representation such that it assigns higher numerical values to document pairs which have higher semantic relationship. Feature space of the documents can be very challenging for document clustering. A document may contain multiple topics, it may contain a large set of class-independent general-words, and a handful class-specific core-words. With these features in mind, traditional agglomerative clustering algori...
An Efficient Fuzzy Clustering-Based Approach for Intrusion Detection
Nguyen, Huu Hoa; Darmont, Jérôme
2011-01-01
The need to increase accuracy in detecting sophisticated cyber attacks poses a great challenge not only to the research community but also to corporations. So far, many approaches have been proposed to cope with this threat. Among them, data mining has brought on remarkable contributions to the intrusion detection problem. However, the generalization ability of data mining-based methods remains limited, and hence detecting sophisticated attacks remains a tough task. In this thread, we present a novel method based on both clustering and classification for developing an efficient intrusion detection system (IDS). The key idea is to take useful information exploited from fuzzy clustering into account for the process of building an IDS. To this aim, we first present cornerstones to construct additional cluster features for a training set. Then, we come up with an algorithm to generate an IDS based on such cluster features and the original input features. Finally, we experimentally prove that our method outperform...
Fuzzy Clustering - Principles, Methods and Examples
DEFF Research Database (Denmark)
Kroszynski, Uri; Zhou, Jianjun
1998-01-01
One of the most remarkable advances in the field of identification and control of systems -in particular mechanical systems- whose behaviour can not be described by means of the usual mathematical models, has been achieved by the application of methods of fuzzy theory.In the framework of a study...... about identification of "black-box" properties by analysis of system input/output data sets, we have prepared an introductory note on the principles and the most popular data classification methods used in fuzzy modeling. This introductory note also includes some examples that illustrate the use of the...... methods. The examples were solved by hand and served as a test bench for exploration of the MATLAB capabilities included in the Fuzzy Control Toolbox. The fuzzy clustering methods described include Fuzzy c-means (FCM), Fuzzy c-lines (FCL) and Fuzzy c-elliptotypes (FCE)....
Sakumichi, Naoyuki; Kawakami, Norio; Ueda, Masahito
2011-01-01
The quantum-statistical cluster expansion method of Lee and Yang is extended to investigate off-diagonal long-range order (ODLRO) in one- and multi-component mixtures of bosons or fermions. Our formulation is applicable to both a uniform system and a trapped system without local-density approximation and allows systematic expansions of one- and multi-particle reduced density matrices in terms of cluster functions which are defined for the same system with Boltzmann statistics. Each term in th...
Park, Sang Ha; Lee, Seokjin; Sung, Koeng-Mo
Non-negative matrix factorization (NMF) is widely used for monaural musical sound source separation because of its efficiency and good performance. However, an additional clustering process is required because the musical sound mixture is separated into more signals than the number of musical tracks during NMF separation. In the conventional method, manual clustering or training-based clustering is performed with an additional learning process. Recently, a clustering algorithm based on the mel-frequency cepstrum coefficient (MFCC) was proposed for unsupervised clustering. However, MFCC clustering supplies limited information for clustering. In this paper, we propose various timbre features for unsupervised clustering and a clustering algorithm with these features. Simulation experiments are carried out using various musical sound mixtures. The results indicate that the proposed method improves clustering performance, as compared to conventional MFCC-based clustering.
基于信息熵的专家聚类赋权方法%Method for determining experts' weights based on entropy and cluster analysis
Institute of Scientific and Technical Information of China (English)
周漩; 张凤鸣; 惠晓滨; 李克武
2011-01-01
According to the methods of determining experts' weights in group decision-making, the existing methods take into account the consistency of experts' collating vectors, but it is lack of the measure of its information similarity. So it may occur that although the collating vector is similar to the group consensus, information uncertainty is great of a certain expert. However, it is given the same weight to the other experts. For this, a method for deriving experts' weights based on entropy and cluster analysis is proposed, in which the collating vectors of all experts are classified with information similarity coefficient, and the experts' weights are determined according to the result of classification and entropy of collating vectors.Finally, a numerical example shows that the method is effective and feasible.%鉴于群组决策专家赋权方法研究中,现有赋权方法虽然考虑了专家给出的排序向量的一致性,但缺乏对排序向量信息相似性的度量,导致可能出现排序向量与群体共识相近,但信息不确定性较大的专家被赋予了与其他专家相同权重的问题.基于此,提出一种基于信息熵的专家聚类赋权方法,运用信息相似系数对排序向量进行聚类分析,根据聚类结果和排序向量的信息熵来确定专家的权重.具体算例表明,该方法有效且可行.
MANNER OF STOCKS SORTING USING CLUSTER ANALYSIS METHODS
Directory of Open Access Journals (Sweden)
Jana Halčinová
2014-06-01
Full Text Available The aim of the present article is to show the possibility of using the methods of cluster analysis in classification of stocks of finished products. Cluster analysis creates groups (clusters of finished products according to similarity in demand i.e. customer requirements for each product. Manner stocks sorting of finished products by clusters is described a practical example. The resultants clusters are incorporated into the draft layout of the distribution warehouse.
Malware Classification based on Call Graph Clustering
Kinable, Joris
2010-01-01
Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DB...
The Local Maximum Clustering Method and Its Application in Microarray Gene Expression Data Analysis
Directory of Open Access Journals (Sweden)
Chen Yidong
2004-01-01
Full Text Available An unsupervised data clustering method, called the local maximum clustering (LMC method, is proposed for identifying clusters in experiment data sets based on research interest. A magnitude property is defined according to research purposes, and data sets are clustered around each local maximum of the magnitude property. By properly defining a magnitude property, this method can overcome many difficulties in microarray data clustering such as reduced projection in similarities, noises, and arbitrary gene distribution. To critically evaluate the performance of this clustering method in comparison with other methods, we designed three model data sets with known cluster distributions and applied the LMC method as well as the hierarchic clustering method, the -mean clustering method, and the self-organized map method to these model data sets. The results show that the LMC method produces the most accurate clustering results. As an example of application, we applied the method to cluster the leukemia samples reported in the microarray study of Golub et al. (1999.
A Clustering Ensemble approach based on the similarities in 2-mode social networks
Institute of Scientific and Technical Information of China (English)
SU Bao-ping; ZHANG Meng-jie
2014-01-01
For a particular clustering problems, selecting the best clustering method is a challenging problem.Research suggests that integrate the multiple clustering can improve the accuracy of clustering ensemble greatly. A new clustering ensemble approach based on the similarities in 2-mode networks is proposed in this paper. First of all, the data object and the initial clustering clusters transform into 2-mode networks, then using the similarities in 2-mode networks to calculate the similarity between different clusters iteratively to refine the adjacency matrix , K-means algorithm is finally used to get the final clustering, then obtain the final clustering results.The method effectively use the similarity between different clusters, example shows the feasibility of this method.
Market Segmentation Using Bayesian Model Based Clustering
Van Hattum, P.
2009-01-01
This dissertation deals with two basic problems in marketing, that are market segmentation, which is the grouping of persons who share common aspects, and market targeting, which is focusing your marketing efforts on one or more attractive market segments. For the grouping of persons who share common aspects a Bayesian model based clustering approach is proposed such that it can be applied to data sets that are specifically used for market segmentation. The cluster algorithm can handle very l...
Li, Chunhui; Sun, Lian; Jia, Junxiang; Cai, Yanpeng; Wang, Xuan
2016-07-01
Source water areas are facing many potential water pollution risks. Risk assessment is an effective method to evaluate such risks. In this paper an integrated model based on k-means clustering analysis and set pair analysis was established aiming at evaluating the risks associated with water pollution in source water areas, in which the weights of indicators were determined through the entropy weight method. Then the proposed model was applied to assess water pollution risks in the region of Shiyan in which China's key source water area Danjiangkou Reservoir for the water source of the middle route of South-to-North Water Diversion Project is located. The results showed that eleven sources with relative high risk value were identified. At the regional scale, Shiyan City and Danjiangkou City would have a high risk value in term of the industrial discharge. Comparatively, Danjiangkou City and Yunxian County would have a high risk value in terms of agricultural pollution. Overall, the risk values of north regions close to the main stream and reservoir of the region of Shiyan were higher than that in the south. The results of risk level indicated that five sources were in lower risk level (i.e., level II), two in moderate risk level (i.e., level III), one in higher risk level (i.e., level IV) and three in highest risk level (i.e., level V). Also risks of industrial discharge are higher than that of the agricultural sector. It is thus essential to manage the pillar industry of the region of Shiyan and certain agricultural companies in the vicinity of the reservoir to reduce water pollution risks of source water areas. PMID:27016678
基于灰色聚类的管网水质评价%Water Quality of Pipe Network Based on the Grey Clustering Method
Institute of Scientific and Technical Information of China (English)
李明
2011-01-01
The water quality of pipe network can be seen as a grey water system, which can be evaluated by using the grey clustering approach to water quality of pipe network.The Grey clustering method can overcome the disadvantages of traditional method of evaluating many factors and indexes a single value.Guangzhou network is exemplified to assess the water quality of pipe network.The results show that,the grey clustering method can use a small number of samples to assess pipe net levels of water quality,consequently obtaining the water quality testing point,which is very convenient to obtain information on the status of each water quality testing point.%管网水质可以视为一个灰色系统,运用灰色聚类方法可对管网水质进行评价。灰色聚类方法克服了传统的用单一值评价多因素多指标问题的弊病。以广州市管网水质为实例,对其管网水质进行评估。结果表明,灰色聚类方法可采用数量较少的样本对管网水质的等级进行评估,从而为各测点水质状况信息的获取提供了便利。
Model-based clustering of array CGH data
Shah, Sohrab P.; Cheung, K-John; Johnson, Nathalie A.; Alain, Guillaume; Gascoyne, Randy D.; Horsman, Douglas E.; Ng, Raymond T.; Murphy, Kevin P.
2009-01-01
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or profiles. This can be due to the presence of heterogeneous cancer subtypes within a supposedly homogeneous population. Results: We propose a novel statistical method for automatically detecting such subtypes or clusters. Our approach is model based: each cluster is defined in terms of a sparse profile...
Detecting influential observations in a model-based cluster analysis
Bruckers, L.; Molenberghs, G; Verbeke, G; Geys, H.
2016-01-01
Finite mixture models have been used to model population heterogeneity and to relax distributional assumptions. These models are also convenient tools for clustering and classification of complex data such as, for example, repeated-measurements data. The performance of model-based clustering algorithms is sensitive to influential and outlying observations. Methods for identifying outliers in a finite mixture model have been described in the literature. Approaches to identify influential obser...
Seeland, Madeleine
2014-01-01
This thesis focuses on graph clustering. It introduces scalable methods for clustering large databases of small graphs by common scaffolds, i.e., the existence of one sufficiently large subgraph shared by all cluster elements. Further, the thesis studies applications for classification and regression. The experimental results show that it is for the first time possible to cluster millions of graphs within a reasonable time using an accurate scaffold-based similarity measure.
Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign
Maciej Kutera; Mirosława Lasek
2010-01-01
Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four sel...
Seniority-based coupled cluster theory
Henderson, Thomas M; Stein, Tamar; Scuseria, Gustavo E
2014-01-01
Doubly occupied configuration interaction (DOCI) with optimized orbitals often accurately describes strong correlations while working in a Hilbert space much smaller than that needed for full configuration interaction. However, the scaling of such calculations remains combinatorial with system size. Pair coupled cluster doubles (pCCD) is very successful in reproducing DOCI energetically, but can do so with low polynomial scaling ($N^3$, disregarding the two-electron integral transformation from atomic to molecular orbitals). We show here several examples illustrating the success of pCCD in reproducing both the DOCI energy and wave function, and show how this success frequently comes about. What DOCI and pCCD lack are an effective treatment of dynamic correlations, which we here add by including higher-seniority cluster amplitudes which are excluded from pCCD. This frozen pair coupled cluster approach is comparable in cost to traditional closed-shell coupled cluster methods with results that are competitive fo...
CORM: An R Package Implementing the Clustering of Regression Models Method for Gene Clustering
Jiejun Shi; Li-Xuan Qin
2014-01-01
We report a new R package implementing the clustering of regression models (CORM) method for clustering genes using gene expression data and provide data examples illustrating each clustering function in the package. The CORM package is freely available at CRAN from http://cran.r-project.org.
Cluster-based control of nonlinear dynamics
Kaiser, Eurika; Spohn, Andreas; Cattafesta, Louis N; Morzynski, Marek
2016-01-01
The ability to manipulate and control fluid flows is of great importance in many scientific and engineering applications. Here, a cluster-based control framework is proposed to determine optimal control laws with respect to a cost function for unsteady flows. The proposed methodology frames high-dimensional, nonlinear dynamics into low-dimensional, probabilistic, linear dynamics which considerably simplifies the optimal control problem while preserving nonlinear actuation mechanisms. The data-driven approach builds upon a state space discretization using a clustering algorithm which groups kinematically similar flow states into a low number of clusters. The temporal evolution of the probability distribution on this set of clusters is then described by a Markov model. The Markov model can be used as predictor for the ergodic probability distribution for a particular control law. This probability distribution approximates the long-term behavior of the original system on which basis the optimal control law is de...
Query Expansion Based on Clustered Results
Liu, Ziyang; Chen, Yi
2011-01-01
Query expansion is a functionality of search engines that suggests a set of related queries for a user-issued keyword query. Typical corpus-driven keyword query expansion approaches return popular words in the results as expanded queries. Using these approaches, the expanded queries may correspond to a subset of possible query semantics, and thus miss relevant results. To handle ambiguous queries and exploratory queries, whose result relevance is difficult to judge, we propose a new framework for keyword query expansion: we start with clustering the results according to user specified granularity, and then generate expanded queries, such that one expanded query is generated for each cluster whose result set should ideally be the corresponding cluster. We formalize this problem and show its APX-hardness. Then we propose two efficient algorithms named iterative single-keyword refinement and partial elimination based convergence, respectively, which effectively generate a set of expanded queries from clustered r...
Logistics Enterprise Evaluation Model Based On Fuzzy Clustering Analysis
Fu, Pei-hua; Yin, Hong-bo
In this thesis, we introduced an evaluation model based on fuzzy cluster algorithm of logistics enterprises. First of all,we present the evaluation index system which contains basic information, management level, technical strength, transport capacity,informatization level, market competition and customer service. We decided the index weight according to the grades, and evaluated integrate ability of the logistics enterprises using fuzzy cluster analysis method. In this thesis, we introduced the system evaluation module and cluster analysis module in detail and described how we achieved these two modules. At last, we gave the result of the system.
An Evolutionary Dynamic Clustering based Colour Image Segmentation
Directory of Open Access Journals (Sweden)
Amiya Halder, Nilvra Pathak
2011-02-01
Full Text Available We have presented a novel Dynamic Colour Image Segmentation (DCISSystem for colour image. In this paper, we have proposed an efficient colourimage segmentation algorithm based on evolutionary approach i.e. dynamic GAbased clustering (GADCIS. The proposed technique automatically determinesthe optimum number of clusters for colour images. The optimal number ofclusters is obtained by using cluster validity criterion with the help of Gaussiandistribution. The advantage of this method is that no a priori knowledge isrequired to segment the color image. The proposed algorithm is evaluated onwell known natural images and its performance is compared to other clusteringtechniques. Experimental results show the performance of the proposedalgorithm producing comparable segmentation results.
Unbiased methods for removing systematics from galaxy clustering measurements
Elsner, Franz; Peiris, Hiranya V
2015-01-01
Measuring the angular clustering of galaxies as a function of redshift is a powerful method for tracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterise and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to th...
Bugge, Anna; Tarp, Jakob; Østergaard, Lars; Domazet, Sidsel Louise; Andersen, Lars Bo; Froberg, Karsten
2014-01-01
Background The aim of the study; LCoMotion – Learning, Cognition and Motion was to develop, document, and evaluate a multi-component physical activity (PA) intervention in public schools in Denmark. The primary outcome was cognitive function. Secondary outcomes were academic skills, body composition, aerobic fitness and PA. The primary aim of the present paper was to describe the rationale, design and methods of the LCoMotion study. Methods/Design LCoMotion was designed as a cluster-randomize...
Directory of Open Access Journals (Sweden)
Kohei Arai
2013-07-01
Full Text Available Cluster analysis aims at identifying groups of similar objects and, therefore helps to discover distribution of patterns and interesting correlations in the data sets. In this paper, we propose to provide a consistent partitioning of a dataset which allows identifying any shape of cluster patterns in case of numerical clustering, convex or non-convex. The method is based on layered structure representation that be obtained from measurement distance and angle of numerical data to the centroid data and based on the iterative clustering construction utilizing a nearest neighbor distance between clusters to merge. Encourage result show the effectiveness of the proposed technique.
Finding Within Cluster Dense Regions Using Distance Based Technique
Wesam Ashour; Motaz Murtaja
2012-01-01
One of the main categories in Data Clustering is density based clustering. Density based clustering techniques like DBSCAN are attractive because they can find arbitrary shaped clusters along with noisy outlier. The main weakness of the traditional density based algorithms like DBSCAN is clustering the different density level data sets. DBSCAN calculations done according to given parameters applied to all points in a data set, while densities of the data set clusters may be totally different....
Myllys, Nanna; Elm, Jonas; Halonen, Roope; Kurtén, Theo; Vehkamäki, Hanna
2016-02-01
We investigate the utilization of the domain local pair natural orbital coupled cluster (DLPNO-CCSD(T)) method for calculating binding energies of atmospherical molecular clusters. Applied to small complexes of atmospherical relevance we find that the DLPNO method significantly reduces the scatter in the binding energy, which is commonly present in DFT calculations. For medium sized clusters consisting of sulfuric acid and bases the DLPNO method yields a systematic underestimation of the binding energy compared to canonical coupled cluster results. The errors in the DFT binding energies appear to be more random, while the systematic nature of the DLPNO results allows the establishment of a scaling factor, to better mimic the canonical coupled cluster calculations. Based on the trends identified for the small and medium sized systems, we further extend the application of the DLPNO method to large acid - base clusters consisting of up to 10 molecules, which have previously been out of reach with accurate coupled cluster methods. Using the Atmospheric Cluster Dynamics Code (ACDC) we compare the sulfuric acid dimer formation based on the new DLPNO binding energies with previously published RI-CC2/aug-cc-pV(T+d)Z results. We also compare the simulated sulfuric acid dimer concentration as a function of the base concentration with measurement data from the CLOUD chamber and flow tube experiments. The DLPNO method, even after scaling, underpredicts the dimer concentration significantly. Reasons for this are discussed. PMID:26771121
Ontology-based topic clustering for online discussion data
Wang, Yongheng; Cao, Kening; Zhang, Xiaoming
2013-03-01
With the rapid development of online communities, mining and extracting quality knowledge from online discussions becomes very important for the industrial and marketing sector, as well as for e-commerce applications and government. Most of the existing techniques model a discussion as a social network of users represented by a user-based graph without considering the content of the discussion. In this paper we propose a new multilayered mode to analysis online discussions. The user-based and message-based representation is combined in this model. A novel frequent concept sets based clustering method is used to cluster the original online discussion network into topic space. Domain ontology is used to improve the clustering accuracy. Parallel methods are also used to make the algorithms scalable to very large data sets. Our experimental study shows that the model and algorithms are effective when analyzing large scale online discussion data.
Missing data treatment method on cluster analysis
Elsiddig Elsadig Mohamed Koko; Amin Ibrahim Adam Mohamed
2015-01-01
The missing data in household health survey was challenged for the researcher because of incomplete analysis. The statistical tool cluster analysis methodology implemented in the collected data of Sudan's household health survey in 2006. Current research specifically focuses on the data analysis as the objective is to deal with the missing values in cluster analysis. Two-Step Cluster Analysis is applied in which each participant is classified into one of the identified pattern and the opt...
Model-based clustering in networks with Stochastic Community Finding
McDaid, Aaron F; Friel, Nial; Hurley, Neil J
2012-01-01
In the model-based clustering of networks, blockmodelling may be used to identify roles in the network. We identify a special case of the Stochastic Block Model (SBM) where we constrain the cluster-cluster interactions such that the density inside the clusters of nodes is expected to be greater than the density between clusters. This corresponds to the intuition behind community-finding methods, where nodes tend to clustered together if they link to each other. We call this model Stochastic Community Finding (SCF) and present an efficient MCMC algorithm which can cluster the nodes, given the network. The algorithm is evaluated on synthetic data and is applied to a social network of interactions at a karate club and at a monastery, demonstrating how the SCF finds the 'ground truth' clustering where sometimes the SBM does not. The SCF is only one possible form of constraint or specialization that may be applied to the SBM. In a more supervised context, it may be appropriate to use other specializations to guide...
Commodity-Based Computing Clusters at PPPL.
Wah, Darren; Davis, Steven L.; Johansson, Marques; Klasky, Scott; Tang, William; Valeo, Ernest
2002-11-01
In order to cost-effectively facilitate mid-scale serial and parallel computations and code development, a number of commodity-based clusters have been built at PPPL. A recent addition is the PETREL cluster, consisting of 100 dual-processor machines, both Intel and AMD, interconnected by a 100Mbit switch. Sixteen machines have an additional Myrinet 2000 interconnect. Also underway is the implementation of a Prototype Topical Computing Facility which will explore the effectiveness and scaling of cluster computing for larger scale fusion codes, specifically including those being developed under the SCIDAC auspices. This facility will consist of two parts: a 64 dual-processor node cluster, with high speed interconnect, and a 16 dual-processor node cluster, utilizing gigabit networking, built for the purpose of exploring grid-enabled computing. The initial grid explorations will be in collaboration with the Princeton University Institute for Computational Science and Engineering (PICSciE), where a 16 processor cluster dedicated to investigation of grid computing is being built. The initial objectives are to (1) grid-enable the GTC code and an MHD code, making use of MPICH-G2 and (2) implement grid-enabled interactive visualization using DXMPI and the Chromium API.
Graph-based clustering and data visualization algorithms
Vathy-Fogarassy, Ágnes
2013-01-01
This work presents a data visualization technique that combines graph-based topology representation and dimensionality reduction methods to visualize the intrinsic data structure in a low-dimensional vector space. The application of graphs in clustering and visualization has several advantages. A graph of important edges (where edges characterize relations and weights represent similarities or distances) provides a compact representation of the entire complex data set. This text describes clustering and visualization methods that are able to utilize information hidden in these graphs, based on
Cluster-based aggregation for inter-vehicle communication
Balanici, Mihail
2015-01-01
The present master thesis is focused on the design and evaluation of a cluster-based aggregation protocol (CBAP), which defines a set of rules and procedures for data aggregation based on a cluster structure. The proposed protocol is regarded as a complex mechanism consisting of two component sub-protocols: a clustering algorithm, grouping vehicles into cluster entities, and an aggregation scheme, deploying in-network and hierarchical data aggregation atop the prebuilt clusters. A cluster-bas...
Unbiased methods for removing systematics from galaxy clustering measurements
Elsner, Franz; Leistedt, Boris; Peiris, Hiranya V.
2016-02-01
Measuring the angular clustering of galaxies as a function of redshift is a powerful method for extracting information from the three-dimensional galaxy distribution. The precision of such measurements will dramatically increase with ongoing and future wide-field galaxy surveys. However, these are also increasingly sensitive to observational and astrophysical contaminants. Here, we study the statistical properties of three methods proposed for controlling such systematics - template subtraction, basic mode projection, and extended mode projection - all of which make use of externally supplied template maps, designed to characterize and capture the spatial variations of potential systematic effects. Based on a detailed mathematical analysis, and in agreement with simulations, we find that the template subtraction method in its original formulation returns biased estimates of the galaxy angular clustering. We derive closed-form expressions that should be used to correct results for this shortcoming. Turning to the basic mode projection algorithm, we prove it to be free of any bias, whereas we conclude that results computed with extended mode projection are biased. Within a simplified setup, we derive analytical expressions for the bias and discuss the options for correcting it in more realistic configurations. Common to all three methods is an increased estimator variance induced by the cleaning process, albeit at different levels. These results enable unbiased high-precision clustering measurements in the presence of spatially varying systematics, an essential step towards realizing the full potential of current and planned galaxy surveys.
Web-based Interface in Public Cluster
Akbar, Z
2007-01-01
A web-based interface dedicated for cluster computer which is publicly accessible for free is introduced. The interface plays an important role to enable secure public access, while providing user-friendly computational environment for end-users and easy maintainance for administrators as well. The whole architecture which integrates both aspects of hardware and software is briefly explained. It is argued that the public cluster is globally a unique approach, and could be a new kind of e-learning system especially for parallel programming communities.
Clustering-based selective neural network ensemble
Institute of Scientific and Technical Information of China (English)
FU Qiang; HU Shang-xu; ZHAO Sheng-ying
2005-01-01
An effective ensemble should consist of a set of networks that are both accurate and diverse. We propose a novel clustering-based selective algorithm for constructing neural network ensemble, where clustering technology is used to classify trained networks according to similarity and optimally select the most accurate individual network from each cluster to make up the ensemble. Empirical studies on regression of four typical datasets showed that this approach yields significantly smaller en semble achieving better performance than other traditional ones such as Bagging and Boosting. The bias variance decomposition of the predictive error shows that the success of the proposed approach may lie in its properly tuning the bias/variance trade-offto reduce the prediction error (the sum of bias2 and variance).
MHCcluster, a method for functional clustering of MHC molecules
DEFF Research Database (Denmark)
Thomsen, Martin Christen Frølund; Lundegaard, Claus; Buus, Søren;
2013-01-01
binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where...... both the functional relationship and the individual binding specificities of MHC molecules are visualized. We demonstrate that conventional sequence-based clustering will fail to identify the functional relationship between molecules, when applied to MHC system, and only through the use of the...
DBCSVM: Density Based Clustering Using Support VectorMachines
Directory of Open Access Journals (Sweden)
Santosh Kumar Rai
2012-07-01
Full Text Available Data categorization is challenging job in a current scenario. The growth rate of a multimedia data are increase day to day in an internet technology. For the better retrieval and efficient searching of a data, a process required for grouping the data. However, data mining can find out helpful implicit information in large databases. To detect the implicit useful information from large databases various data mining techniques are use. Data clustering is an important data mining technique for grouping data sets into different clusters and each cluster having same properties of data. In this paper we have taken image data sets and firstly applying the density based clustering to grouped the images, density based clustering grouped the images according to the nearest feature sets but not grouped outliers, then we used an important super hyperplane classifier support vector machine (SVM which classify the all outlier left from density based clustering. This method improves the efficiency of image grouping and gives better results.
Semisupervised Clustering for Networks Based on Fast Affinity Propagation
Directory of Open Access Journals (Sweden)
Mu Zhu
2013-01-01
Full Text Available Most of the existing clustering algorithms for networks are unsupervised, which cannot help improve the clustering quality by utilizing a small number of prior knowledge. We propose a semisupervised clustering algorithm for networks based on fast affinity propagation (SCAN-FAP, which is essentially a kind of similarity metric learning method. Firstly, we define a new constraint similarity measure integrating the structural information and the pairwise constraints, which reflects the effective similarities between nodes in networks. Then, taking the constraint similarities as input, we propose a fast affinity propagation algorithm which keeps the advantages of the original affinity propagation algorithm while increasing the time efficiency by passing only the messages between certain nodes. Finally, by extensive experimental studies, we demonstrate that the proposed algorithm can take fully advantage of the prior knowledge and improve the clustering quality significantly. Furthermore, our algorithm has a superior performance to some of the state-of-art approaches.
Clustering Seven Data Sets by Means of Some or All of Seven Clustering Methods.
Dreger, Ralph Mason; And Others
1988-01-01
Seven data sets (namely, clinical data on children) were subjected to clustering by seven algorithms--the B-coefficient, Linear Typal Analysis; elementary linkage analysis, Numerical Taxonomy System, Statistical Analysis System hierarchical clustering method, Taxonomy, and Bolz's Type Analysis. The little-known B-coefficient method compared…
Clustering Methods for Real Estate Portfolios
William N. Goetzmann; Susan M. Wachter
1998-01-01
A clustering algorithm is applied to effective rents for twenty-one U.S. office markets, and to twenty-two metropolitan markets using vacancy data. It provides support for the conjecture that there exists a few major families of cities: including an oil and gas group and an industrial Northeast group. Unlike other clustering studies, we find strong evidence of bicoastal city associations among cities such as Boston and Los Angeles. We present a bootstrapping methodology for investigating the ...
ENERGY OPTIMIZATION IN CLUSTER BASED WIRELESS SENSOR NETWORKS
Directory of Open Access Journals (Sweden)
T. SHANKAR
2014-04-01
Full Text Available Wireless sensor networks (WSN are made up of sensor nodes which are usually battery-operated devices, and hence energy saving of sensor nodes is a major design issue. To prolong the networks lifetime, minimization of energy consumption should be implemented at all layers of the network protocol stack starting from the physical to the application layer including cross-layer optimization. Optimizing energy consumption is the main concern for designing and planning the operation of the WSN. Clustering technique is one of the methods utilized to extend lifetime of the network by applying data aggregation and balancing energy consumption among sensor nodes of the network. This paper proposed new version of Low Energy Adaptive Clustering Hierarchy (LEACH, protocols called Advanced Optimized Low Energy Adaptive Clustering Hierarchy (AOLEACH, Optimal Deterministic Low Energy Adaptive Clustering Hierarchy (ODLEACH, and Varying Probability Distance Low Energy Adaptive Clustering Hierarchy (VPDL combination with Shuffled Frog Leap Algorithm (SFLA that enables selecting best optimal adaptive cluster heads using improved threshold energy distribution compared to LEACH protocol and rotating cluster head position for uniform energy dissipation based on energy levels. The proposed algorithm optimizing the life time of the network by increasing the first node death (FND time and number of alive nodes, thereby increasing the life time of the network.
Cancer detection based on Raman spectra super-paramagnetic clustering
González-Solís, José Luis; Guizar-Ruiz, Juan Ignacio; Martínez-Espinosa, Juan Carlos; Martínez-Zerega, Brenda Esmeralda; Juárez-López, Héctor Alfonso; Vargas-Rodríguez, Héctor; Gallegos-Infante, Luis Armando; González-Silva, Ricardo Armando; Espinoza-Padilla, Pedro Basilio; Palomares-Anda, Pascual
2016-08-01
The clustering of Raman spectra of serum sample is analyzed using the super-paramagnetic clustering technique based in the Potts spin model. We investigated the clustering of biochemical networks by using Raman data that define edge lengths in the network, and where the interactions are functions of the Raman spectra's individual band intensities. For this study, we used two groups of 58 and 102 control Raman spectra and the intensities of 160, 150 and 42 Raman spectra of serum samples from breast and cervical cancer and leukemia patients, respectively. The spectra were collected from patients from different hospitals from Mexico. By using super-paramagnetic clustering technique, we identified the most natural and compact clusters allowing us to discriminate the control and cancer patients. A special interest was the leukemia case where its nearly hierarchical observed structure allowed the identification of the patients's leukemia type. The goal of this study is to apply a model of statistical physics, as the super-paramagnetic, to find these natural clusters that allow us to design a cancer detection method. To the best of our knowledge, this is the first report of preliminary results evaluating the usefulness of super-paramagnetic clustering in the discipline of spectroscopy where it is used for classification of spectra.
Visual cluster analysis and pattern recognition template and methods
Energy Technology Data Exchange (ETDEWEB)
Osbourn, G.C.; Martinez, R.F.
1993-12-31
This invention is comprised of a method of clustering using a novel template to define a region of influence. Using neighboring approximation methods, computation times can be significantly reduced. The template and method are applicable and improve pattern recognition techniques.
Malware Classification based on Call Graph Clustering
Kinable, Joris; Kostakis, Orestis
2010-01-01
Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations...
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPING
Directory of Open Access Journals (Sweden)
Khadoudja Ghanem
2013-03-01
Full Text Available An intrinsic problem of classifiers based on machine learning (ML methods is that their learning time grows as the size and complexity of the training dataset increases. For this reason, it is important to have efficient computational methods and algorithms that can be applied on large datasets, such that it is still possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is combined with Expectation, Maximization (EM algorithm to develop an efficient Hidden Markov Model (HMM training. The idea of the proposed process consists of two steps. In the first step, training instances with similar inputs are clustered and a weight factor which represents the frequency of these instances is assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity function to cluster similar examples. In the second step, all formulas in the classical HMM training algorithm (EM associated with the number of training instances are modified to include the weight factor in appropriate terms. This process significantly accelerates HMM training while maintaining the same initial, transition and emission probabilities matrixes as those obtained with the classical HMM training algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set, speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach is not limited to training HMMs, but it can be employed for a large variety of MLs methods.
Core Business Selection Based on Ant Colony Clustering Algorithm
Directory of Open Access Journals (Sweden)
Yu Lan
2014-01-01
Full Text Available Core business is the most important business to the enterprise in diversified business. In this paper, we first introduce the definition and characteristics of the core business and then descript the ant colony clustering algorithm. In order to test the effectiveness of the proposed method, Tianjin Port Logistics Development Co., Ltd. is selected as the research object. Based on the current situation of the development of the company, the core business of the company can be acquired by ant colony clustering algorithm. Thus, the results indicate that the proposed method is an effective way to determine the core business for company.
Comparison of two cluster analysis methods using single particle mass spectra
Zhao, Weixiang; Hopke, Philip K.; Prather, Kimberly A.
Cluster analysis of aerosol time-of-flight mass spectrometry (ATOFMS) data has been an effective tool for the identification of possible sources of ambient aerosols. In this study, the clustering results of two typical methods, adaptive resonance theory-based neural networks-2a (ART-2a) and density-based clustering of application with noise (DBSCAN), on ATOFMS data were investigated by employing a set of benchmark ATOFMS data. The advantages and disadvantages of these two methods are discussed and some feasible remedies proposed for problems encountered in the clustering process. The results of this study will provide promising directions for future work on ambient aerosol cluster analysis, suggesting a more effective and feasible clustering strategy based on the integration of ART-2a and DBSCAN.
Directory of Open Access Journals (Sweden)
Peixin Zhao
2013-01-01
Full Text Available Community detection in social networks plays an important role in cluster analysis. Many traditional techniques for one-dimensional problems have been proven inadequate for high-dimensional or mixed type datasets due to the data sparseness and attribute redundancy. In this paper we propose a graph-based clustering method for multidimensional datasets. This novel method has two distinguished features: nonbinary hierarchical tree and the multi-membership clusters. The nonbinary hierarchical tree clearly highlights meaningful clusters, while the multimembership feature may provide more useful service strategies. Experimental results on the customer relationship management confirm the effectiveness of the new method.
Sakumichi, Naoyuki; Kawakami, Norio; Ueda, Masahito
2012-04-01
The quantum-statistical cluster expansion method of Lee and Yang is extended to investigate off-diagonal long-range order (ODLRO) in one-component and multicomponent mixtures of bosons or fermions. Our formulation is applicable to both a uniform system and a trapped system without local-density approximation and allows systematic expansions of one-particle and multiparticle reduced density matrices in terms of cluster functions, which are defined for the same system with Boltzmann statistics. Each term in this expansion can be associated with a Lee-Yang graph. We elucidate a physical meaning of each Lee-Yang graph; in particular, for a mixture of ultracold atoms and bound dimers, an infinite sum of the ladder-type Lee-Yang 0-graphs is shown to lead to Bose-Einstein condensation of dimers below the critical temperature. In the case of Bose statistics, an infinite series of Lee-Yang 1-graphs is shown to converge and gives the criteria of ODLRO at the one-particle level. Applications to a dilute Bose system of hard spheres are also made. In the case of Fermi statistics, an infinite series of Lee-Yang 2-graphs is shown to converge and gives the criteria of ODLRO at the two-particle level. Applications to a two-component Fermi gas in the tightly bound limit are also made.
Sagar S. De; Minati Mishra; Satchidananda Dehuri
2013-01-01
In the visual data mining, visualization of clusters is a challenging task. Although lots of techniques already have been developed, the challenges still remain to represent large volume of data with multiple dimension and overlapped clusters. In this paper, a multivariate clusters visualization technique (MVClustViz) has been presented to visualize the centroid-based clusters. The geographic projection technique supports multi-dimension, large volume, and both crisp and fuzzy clusters visual...
Finding Within Cluster Dense Regions Using Distance Based Technique
Directory of Open Access Journals (Sweden)
Wesam Ashour
2012-03-01
Full Text Available One of the main categories in Data Clustering is density based clustering. Density based clustering techniques like DBSCAN are attractive because they can find arbitrary shaped clusters along with noisy outlier. The main weakness of the traditional density based algorithms like DBSCAN is clustering the different density level data sets. DBSCAN calculations done according to given parameters applied to all points in a data set, while densities of the data set clusters may be totally different. The proposed algorithm overcomes this weakness of the traditional density based algorithms. The algorithm starts with partitioning the data within a cluster to units based on a user parameter and compute the density for each unit separately. Consequently, the algorithm compares the results and merges neighboring units with closer approximate density values to become a new cluster. The experimental results of the simulation show that the proposed algorithm gives good results in finding clusters for different density cluster data set.
MHCcluster, a method for functional clustering of MHC molecules.
Thomsen, Martin; Lundegaard, Claus; Buus, Søren; Lund, Ole; Nielsen, Morten
2013-09-01
The identification of peptides binding to major histocompatibility complexes (MHC) is a critical step in the understanding of T cell immune responses. The human MHC genomic region (HLA) is extremely polymorphic comprising several thousand alleles, many encoding a distinct molecule. The potentially unique specificities remain experimentally uncharacterized for the vast majority of HLA molecules. Likewise, for nonhuman species, only a minor fraction of the known MHC molecules have been characterized. Here, we describe a tool, MHCcluster, to functionally cluster MHC molecules based on their predicted binding specificity. The method has a flexible web interface that allows the user to include any MHC of interest in the analysis. The output consists of a static heat map and graphical tree-based visualizations of the functional relationship between MHC variants and a dynamic TreeViewer interface where both the functional relationship and the individual binding specificities of MHC molecules are visualized. We demonstrate that conventional sequence-based clustering will fail to identify the functional relationship between molecules, when applied to MHC system, and only through the use of the predicted binding specificity can a correct clustering be found. Clustering of prevalent HLA-A and HLA-B alleles using MHCcluster confirms the presence of 12 major specificity groups (supertypes) some however with highly divergent specificities. Importantly, some HLA molecules are shown not to fit any supertype classification. Also, we use MHCcluster to show that chimpanzee MHC class I molecules have a reduced functional diversity compared to that of HLA class I molecules. MHCcluster is available at www.cbs.dtu.dk/services/MHCcluster-2.0. PMID:23775223
Li, Hao; Li, Peng; Xie, Jing; Yi, Shengjie; Yang, Chaojie; Wang, Jian; Sun, Jichao; Liu, Nan; Wang, Xu; Wu, Zhihao; Wang, Ligui; Hao, Rongzhang; Wang, Yong; Jia, Leili; Li, Kaiqin; Qiu, Shaofu; Song, Hongbin
2014-08-01
A clustered regularly interspaced short palindromic repeat (CRISPR) typing method has recently been developed and used for typing and subtyping of Salmonella spp., but it is complicated and labor intensive because it has to analyze all spacers in two CRISPR loci. Here, we developed a more convenient and efficient method, namely, CRISPR locus spacer pair typing (CLSPT), which only needs to analyze the two newly incorporated spacers adjoining the leader array in the two CRISPR loci. We analyzed a CRISPR array of 82 strains belonging to 21 Salmonella serovars isolated from humans in different areas of China by using this new method. We also retrieved the newly incorporated spacers in each CRISPR locus of 537 Salmonella isolates which have definite serotypes in the Pasteur Institute's CRISPR Database to evaluate this method. Our findings showed that this new CLSPT method presents a high level of consistency (kappa = 0.9872, Matthew's correlation coefficient = 0.9712) with the results of traditional serotyping, and thus, it can also be used to predict serotypes of Salmonella spp. Moreover, this new method has a considerable discriminatory power (discriminatory index [DI] = 0.8145), comparable to those of multilocus sequence typing (DI = 0.8088) and conventional CRISPR typing (DI = 0.8684). Because CLSPT only costs about $5 to $10 per isolate, it is a much cheaper and more attractive method for subtyping of Salmonella isolates. In conclusion, this new method will provide considerable advantages over other molecular subtyping methods, and it may become a valuable epidemiologic tool for the surveillance of Salmonella infections. PMID:24899040
Institute of Scientific and Technical Information of China (English)
李超顺; 周建中; 肖剑; 肖汉
2013-01-01
Kernel clustering is a kind of valid methods for vibration fault diagnosis of hydro-turbine generating unit (HGU). In order to solve the problem of evaluating clustering results and selecting parameter of kernel function, a novel gravitational search based kernel clustering (GSKC) was proposed. At first, the kernel clustering objective function was built based on kernel Xie-Beni clustering index, then the gravitational search method was introduced and applied to solve the objective function, while the clustering center and parameter of kernel function were encoded as optimization variables together; in this end the fault diagnosis model based on similarity was defined. UCI testing data sets were used to check the classification accuracy, and then CSKC was applied in fault diagnosis of HGU. Experimental results show that GSKC was more accurate in classification than traditional methods, meanwhile GSKC was able to cluster the fault samples of HGU effectively, and diagnosis different kinds of fault accurately.%核聚类是一类有效的水力发电机组振动故障诊断方法,为了解决核聚类有效性评价和核参数选择的问题,提出了一种引力搜索核聚类算法.首先建立以核Xie-Beni指标为目标的聚类模型;然后引入引力搜索框架,以聚类中心和核函数参数为优化变量,通过引力搜索求解核聚类模型；最后定义了基于核空间样本相似度的故障诊断模型.利用国际标准样本集对该方法进行分类测试,并将该方法应用于水电机组振动故障诊断.试验结果表明:与传统聚类方法相比,文中方法具有更高分类精度,且能对故障样本准确聚类并提取诊断模型参数,实现故障的准确诊断.
Institute of Scientific and Technical Information of China (English)
殷春武
2013-01-01
Subject clusters and industrial clusters cooperative innovation is the priority among priorities for guaranteeing the sustainable development of regional economy. Based on analyzing the importance of the subject cluster and industrial cluster cooperative innovation ability,double cluster cooperative innovation ability evaluation index system is constructed,and u-sing OWA operator assembly the multiple weight determining methods the combination weights of evaluation index is ob-tained .Combining assessment scale of language scale and gray degree is proposed for evaluation.Final a double cluster co-operative innovation ability evaluation method is proposed based on fuzzy set and gray degree and full of double cluster syn-ergy innovation ability evaluation theory.%学科集群和产业集群的协同创新是保障区域经济可持续发展的重中之重。在充分分析学科集群和产业集群协同创新能力重要性的基础上，着重构造双集群协同创新能力评价指标体系，利用OWA算子集结多种权重确定方法实现评价指标的组合赋权，根据协同创新能力评价的不可定量性和不可知性提出利用语言标度与灰度相结合的评价标度进行评价，最后给出一种基于模糊灰度的双集群协同创新能力评价方法，充实了双集群协同创新能力评价理论体系。
Comparison of Selected Methods for Document Clustering
Czech Academy of Sciences Publication Activity Database
Ševčík, R.; Řezanková, H.; Húsek, Dušan
Berlin : Springer, 2011 - (Mugellini, E.; Szczepaniak, P.; Pettenati, M.; Sokhn, M.), s. 101-110 ISBN 978-3-642-18028-6. ISSN 1867-5662. - (Advances in Intelligent and Soft Computing. 86). [AWIC 2011. Atlantic Web Intelligence Conference /7./. Fribourg (CH), 26.01.2011-28.01.2011] R&D Projects: GA ČR GAP202/10/0262; GA ČR GA205/09/1079 Institutional research plan: CEZ:AV0Z10300504 Keywords : web clustering * cluster analysis * textual documents * web content classification * newsgroups analysis * vector model Subject RIV: IN - Informatics, Computer Science
ONTOLOGY BASED DOCUMENT CLUSTERING USING MAPREDUCE
Directory of Open Access Journals (Sweden)
Abdelrahman Elsayed
2015-05-01
Full Text Available Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common method for representing documents is the vector space model, which represents document features as a bag of words and does not represent semantic relations between words. In this paper we introduce a distributed implementation for the bisecting k-means using MapReduce programming model. The aim behind our proposed implementation is to solve the problem of clustering intensive data documents. In addition, we propose integrating the WordNet ontology with bisecting k-means in order to utilize the semantic relations between words to enhance document clustering results. Our presented experimental results show that using lexical categories for nouns only enhances internal evaluation measures of document clustering; and decreases the documents features from thousands to tens features. Our experiments were conducted using Amazon Elastic MapReduce to deploy the Bisecting k-means algorithm.
PRIVACY PRESERVING CLUSTERING BASED ON LINEAR APPROXIMATION OF FUNCTION
Rajesh Pasupuleti; Narsimha Gugulothu
2014-01-01
Clustering analysis initiatives a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of the requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randoml...
CLUSTERING-BASED ANALYSIS OF TEXT SIMILARITY
Bovcon , Borja
2013-01-01
The focus of this thesis is comparison of analysis of text-document similarity using clustering algorithms. We begin by defining main problem and then, we proceed to describe the two most used text-document representation techniques, where we present words filtering methods and their importance, Porter's algorithm and tf-idf term weighting algorithm. We then proceed to apply all previously described algorithms on selected data-sets, which vary in size and compactness. Fallowing this, we ...
A clustering routing algorithm based on improved ant colony clustering for wireless sensor networks
Xiao, Xiaoli; Li, Yang
Because of real wireless sensor network node distribution uniformity, this paper presents a clustering strategy based on the ant colony clustering algorithm (ACC-C). To reduce the energy consumption of the head near the base station and the whole network, The algorithm uses ant colony clustering on non-uniform clustering. The improve route optimal degree is presented to evaluate the performance of the chosen route. Simulation results show that, compared with other algorithms, like the LEACH algorithm and the improve particle cluster kind of clustering algorithm (PSC - C), the proposed approach is able to keep away from the node with less residual energy, which can improve the life of networks.
A new method to measure the mass of galaxy clusters
Falco, Martina; Wojtak, Radoslaw; Brinckmann, Thejs; Lindholmer, Mikkel; Pandolfi, Stefania
2013-01-01
The mass measurement of galaxy clusters is an important tool for the determination of cosmological parameters describing the matter and energy content of the Universe. However, the standard methods rely on various assumptions about the shape or the level of equilibrium of the cluster. We present a novel method of measuring cluster masses. It is complementary to most of the other methods, since it only uses kinematical information from outside the virialized cluster. Our method identifies objects, as galaxy sheets or filaments, in the cluster outer region, and infers the cluster mass by modeling how the massive cluster perturbs the motion of the structures from the Hubble flow. At the same time, this technique allows to constrain the three-dimensional orientation of the detected structures with a good accuracy. We use a cosmological numerical simulation to test the method. We then apply the method to the Coma cluster, where we find two galaxy sheets, and measure the mass of Coma to be Mvir=(9.2\\pm2.4)10^{14} M...
Information Clustering Based on Fuzzy Multisets.
Miyamoto, Sadaaki
2003-01-01
Proposes a fuzzy multiset model for information clustering with application to information retrieval on the World Wide Web. Highlights include search engines; term clustering; document clustering; algorithms for calculating cluster centers; theoretical properties concerning clustering algorithms; and examples to show how the algorithms work.…
Directory of Open Access Journals (Sweden)
Susan Worner
2013-09-01
Full Text Available For greater preparedness, pest risk assessors are required to prioritise long lists of pest species with potential to establish and cause significant impact in an endangered area. Such prioritization is often qualitative, subjective, and sometimes biased, relying mostly on expert and stakeholder consultation. In recent years, cluster based analyses have been used to investigate regional pest species assemblages or pest profiles to indicate the risk of new organism establishment. Such an approach is based on the premise that the co-occurrence of well-known global invasive pest species in a region is not random, and that the pest species profile or assemblage integrates complex functional relationships that are difficult to tease apart. In other words, the assemblage can help identify and prioritise species that pose a threat in a target region. A computational intelligence method called a Kohonen self-organizing map (SOM, a type of artificial neural network, was the first clustering method applied to analyse assemblages of invasive pests. The SOM is a well known dimension reduction and visualization method especially useful for high dimensional data that more conventional clustering methods may not analyse suitably. Like all clustering algorithms, the SOM can give details of clusters that identify regions with similar pest assemblages, possible donor and recipient regions. More important, however SOM connection weights that result from the analysis can be used to rank the strength of association of each species within each regional assemblage. Species with high weights that are not already established in the target region are identified as high risk. However, the SOM analysis is only the first step in a process to assess risk to be used alongside or incorporated within other measures. Here we illustrate the application of SOM analyses in a range of contexts in invasive species risk assessment, and discuss other clustering methods such as k
A Method of Deep Web Clustering Based on SOM Neural Network%一种基于自组织映射神经网络的Deep Web聚类方法
Institute of Scientific and Technical Information of China (English)
吴凌云
2012-01-01
为提高Deepwleb数据源聚类的效率，降低人工参与度，提出了一种基于自组织映射网络SOM的DeepWeb接口聚类方法。该方法采用PRE．QUERY方式，使用接口表单的结构特征统计量作为输入。在UIUC数据集上测试后取得了预期的效果。%In order to improve the efficiency of Deep Web data sources clustering and reduce the manual work, this paper addressed a method of Deep Web interface clustering based on self-orgaalizing map neural network, which utilizes PREQUERY and takes the struetual statistic as inputs. After testing on UIUC datasets, this method gets an expected effect.
On Comparison of Clustering Methods for Pharmacoepidemiological Data.
Feuillet, Fanny; Bellanger, Lise; Hardouin, Jean-Benoit; Victorri-Vigneau, Caroline; Sébille, Véronique
2015-01-01
The high consumption of psychotropic drugs is a public health problem. Rigorous statistical methods are needed to identify consumption characteristics in post-marketing phase. Agglomerative hierarchical clustering (AHC) and latent class analysis (LCA) can both provide clusters of subjects with similar characteristics. The objective of this study was to compare these two methods in pharmacoepidemiology, on several criteria: number of clusters, concordance, interpretation, and stability over time. From a dataset on bromazepam consumption, the two methods present a good concordance. AHC is a very stable method and it provides homogeneous classes. LCA is an inferential approach and seems to allow identifying more accurately extreme deviant behavior. PMID:24905478
Directory of Open Access Journals (Sweden)
Alex Ing
Full Text Available Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS and cluster mass statistic (CMS--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2. Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Directory of Open Access Journals (Sweden)
Baumbach Jan
2007-10-01
Full Text Available Abstract Background Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools (Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation. Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences (66 organisms obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. Conclusion FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/.
Component Based Clustering in Wireless Sensor Networks
Amaxilatis, Dimitrios; Koninis, Christos; Pyrgelis, Apostolos
2011-01-01
Clustering is an important research topic for wireless sensor networks (WSNs). A large variety of approaches has been presented focusing on different performance metrics. Even though all of them have many practical applications, an extremely limited number of software implementations is available to the research community. Furthermore, these very few techniques are implemented for specific WSN systems or are integrated in complex applications. Thus it is very difficult to comparatively study their performance and almost impossible to reuse them in future applications under a different scope. In this work we study a large body of well established algorithms. We identify their main building blocks and propose a component-based architecture for developing clustering algorithms that (a) promotes exchangeability of algorithms thus enabling the fast prototyping of new approaches, (b) allows cross-layer implementations to realize complex applications, (c) offers a common platform to comparatively study the performan...
Li, Xin-Xiong; Wang, Yang-Xin; Wang, Rui-Hu; Cui, Cai-Yan; Tian, Chong-Bin; Yang, Guo-Yu
2016-05-23
A new approach to prepare heterometallic cluster organic frameworks has been developed. The method was employed to link Anderson-type polyoxometalate (POM) clusters and transition-metal clusters by using a designed rigid tris(alkoxo) ligand containing a pyridyl group to form a three-fold interpenetrated anionic diamondoid structure and a 2D anionic layer, respectively. This technique facilitates the integration of the unique inherent properties of Anderson-type POM clusters and cuprous iodide clusters into one cluster organic framework. PMID:27061042
AN IMPROVED TEACHING-LEARNING BASED OPTIMIZATION APPROACH FOR FUZZY CLUSTERING
Directory of Open Access Journals (Sweden)
Parastou Shahsamandi E.
2014-11-01
Full Text Available Fuzzy clustering has been widely studied and applied in a variety of key areas of science and engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO algorithm is used for data clustering, in which the objects in the same cluster are similar. This algorithm has been tested on several datasets and compared with some other popular algorithm in clustering. Results have been shown that the proposed method improves the output of clustering and can be efficiently used for fuzzy clustering.
International Nuclear Information System (INIS)
The evolution of the microstructure of dilute Fe alloys under irradiation has been modelled using a multiscale approach based on ab initio and atomistic kinetic Monte Carlo simulations. In these simulations, both self interstitials and vacancies, isolated or in clusters, are considered. Isochronal annealing after electron irradiation experiments have been simulated in pure Fe, Fe-Cu and Fe-Mn dilute alloys, focusing on recovery stages I and II. The parameters regarding the self interstitial - solute atom interactions are based on ab initio predictions and some of these interactions have been slightly adjusted, without modifying the interaction character, on isochronal annealing experimental data. The different recovery peaks are globally well reproduced. These simulations allow interpreting the different recovery peaks as well as the effect of varying solute concentration. For some peaks, these simulations have allowed to revisit and re-interpret the experimental data. In Fe-Cu, the trapping of self interstitials by Cu atoms allows experimental results to be reproduced, although no mixed dumbbells are formed, contrary to the former interpretations. Whereas, in Fe-Mn, the favorable formation of mixed dumbbell plays an important role in the Mn effect.
Durán Pacheco, Gonzalo; Hattendorf, Jan; Colford, John M; Mäusezahl, Daniel; Smith, Thomas
2009-10-30
Many different methods have been proposed for the analysis of cluster randomized trials (CRTs) over the last 30 years. However, the evaluation of methods on overdispersed count data has been based mostly on the comparison of results using empiric data; i.e. when the true model parameters are not known. In this study, we assess via simulation the performance of five methods for the analysis of counts in situations similar to real community-intervention trials. We used the negative binomial distribution to simulate overdispersed counts of CRTs with two study arms, allowing the period of time under observation to vary among individuals. We assessed different sample sizes, degrees of clustering and degrees of cluster-size imbalance. The compared methods are: (i) the two-sample t-test of cluster-level rates, (ii) generalized estimating equations (GEE) with empirical covariance estimators, (iii) GEE with model-based covariance estimators, (iv) generalized linear mixed models (GLMM) and (v) Bayesian hierarchical models (Bayes-HM). Variation in sample size and clustering led to differences between the methods in terms of coverage, significance, power and random-effects estimation. GLMM and Bayes-HM performed better in general with Bayes-HM producing less dispersed results for random-effects estimates although upward biased when clustering was low. GEE showed higher power but anticonservative coverage and elevated type I error rates. Imbalance affected the overall performance of the cluster-level t-test and the GEE's coverage in small samples. Important effects arising from accounting for overdispersion are illustrated through the analysis of a community-intervention trial on Solar Water Disinfection in rural Bolivia. PMID:19672840
Research on Web Service Clustering Based on Feature Model
Zhigang Zhang
2014-01-01
Web service clustering helps to enhance the efficiency of service discovery and the accuracy of service clustering will influence the service discovery efficiency directly. Web service clustering is an important research direction in the area of service computing. In order to solve the problem of low accuracy of service clustering methods, this study proposes a web service clustering approach using feature model. This approach considers the features of user...
Parallel Density-Based Clustering for Discovery of Ionospheric Phenomena
Pankratius, V.; Gowanlock, M.; Blair, D. M.
2015-12-01
Ionospheric total electron content maps derived from global networks of dual-frequency GPS receivers can reveal a plethora of ionospheric features in real-time and are key to space weather studies and natural hazard monitoring. However, growing data volumes from expanding sensor networks are making manual exploratory studies challenging. As the community is heading towards Big Data ionospheric science, automation and Computer-Aided Discovery become indispensable tools for scientists. One problem of machine learning methods is that they require domain-specific adaptations in order to be effective and useful for scientists. Addressing this problem, our Computer-Aided Discovery approach allows scientists to express various physical models as well as perturbation ranges for parameters. The search space is explored through an automated system and parallel processing of batched workloads, which finds corresponding matches and similarities in empirical data. We discuss density-based clustering as a particular method we employ in this process. Specifically, we adapt Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm groups geospatial data points based on density. Clusters of points can be of arbitrary shape, and the number of clusters is not predetermined by the algorithm; only two input parameters need to be specified: (1) a distance threshold, (2) a minimum number of points within that threshold. We discuss an implementation of DBSCAN for batched workloads that is amenable to parallelization on manycore architectures such as Intel's Xeon Phi accelerator with 60+ general-purpose cores. This manycore parallelization can cluster large volumes of ionospheric total electronic content data quickly. Potential applications for cluster detection include the visualization, tracing, and examination of traveling ionospheric disturbances or other propagating phenomena. Acknowledgments. We acknowledge support from NSF ACI-1442997 (PI V. Pankratius).
A BIPARTITE GRAPH PARTITION AND LINK BASED APPROACH FOR SOLVING CATEGORICAL DATA CLUSTERING
E. Dhivyalakshmi; Ramakrishnan, R; K. Umapathy
2013-01-01
Clustering ensembles have emerged as a powerful method for improving both the robustness and the stability of unsupervised classification solutions. The project presents an analysis that suggests this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. A critical problem in cluster ensemble research is how to combine multiple c...
Statistical methods in cancer research: Investigating localized clusters of disease
International Nuclear Information System (INIS)
The search for 'clusters' of cancer cases has been a major feature of epidemiological research, and has come to the attention of the public following the publicity given to some childhood leukemia cases in villages surrounding the Sellafield reprocessing plant in England. Clustering is a poorly defined concept. Modern methodological developments have involved two distinct methods, quadrat counts and distance methods. Although statistical methods can indicate (with some error) whether clustering is present, they provide little information as to the real cause. Putative causes will always require further, more specific, studies before the results become meaningful. 5 refs
New Density-Based Clustering Technique: GMDBSCAN-UR
Mohammed A. Alhanjouri; Rwand D. Ahmed
2012-01-01
Density Based Spatial Clustering of Applications of Noise (DBSCAN) is one of the most popular algorithms for cluster analysis. It can discover clusters with arbitrary shape and separate noises. But this algorithm cannot choose its parameter according to distribution of dataset. It simply uses the global minimum number of points (MinPts) parameter, so that the clustering result of multi-density database is inaccurate. In addition, when it used to cluster large databases, it will cost too much ...
Cluster structure of nuclei based on AMD
International Nuclear Information System (INIS)
The technique of cooling the energy of the system being examined by using molecular dynamics is utilized for multi-dimensional variation calculation in the fields of physical properties and chemistry. By simulating the cooling of a finite nucleon system, the ground state of atomic nuclei can be studied. By antisymmetrized molecular dynamics, as for the ordinary nuclei with proton number Z=2n and neutron number N=2n, the cluster structure is examined. Further, the nuclei with excess neutrons, to which attention has been paid recently, are examined, and the feature of the systems with different Z and N are noticed. As to AMD method, the wave function, the ground state and the extension of the wave function are explained. AMD was applied to the even-even nuclei of A=4n. The results of density distribution are shown. It is known that most of 4n nuclei are constituted with the basic unit of alpha cluster. In the atomic nuclei with 4 nucleons in a closed shell, they have the developed cluster structure. The various internal deformation corresponding to the number of nucleons was observed. In the nuclei with excess neutrons Z< N, because the shell structures of protons and neutrons are different, the overall structure is determined by respective effects. The dependence of nuclear structure on the number of neutrons is reported. (K.I.)
Clustering Methods Application for Customer Segmentation to Manage Advertisement Campaign
Directory of Open Access Journals (Sweden)
Maciej Kutera
2010-10-01
Full Text Available Clustering methods are recently so advanced elaborated algorithms for large collection data analysis that they have been already included today to data mining methods. Clustering methods are nowadays larger and larger group of methods, very quickly evolving and having more and more various applications. In the article, our research concerning usefulness of clustering methods in customer segmentation to manage advertisement campaign is presented. We introduce results obtained by using four selected methods which have been chosen because their peculiarities suggested their applicability to our purposes. One of the analyzed method k-means clustering with random selected initial cluster seeds gave very good results in customer segmentation to manage advertisement campaign and these results were presented in details in the article. In contrast one of the methods (hierarchical average linkage was found useless in customer segmentation. Further investigations concerning benefits of clustering methods in customer segmentation to manage advertisement campaign is worth continuing, particularly that finding solutions in this field can give measurable profits for marketing activity.
International Nuclear Information System (INIS)
An atomistic Monte Carlo model parameterised on electronic structure calculations data has been used to study the formation and evolution under irradiation of solute clusters in Fe–MnNi ternary and Fe–CuMnNi quaternary alloys. Two populations of solute rich clusters have been observed, which can be discriminated by whether or not the solute atoms are associated with self-interstitial clusters. Mn–Ni-rich clusters are observed at a very early stage of the irradiation in both modelled alloys, whereas the quaternary alloys contain also Cu-containing clusters. Mn–Ni-rich clusters nucleate very early via a self-interstitial-driven mechanism, earlier than Cu-rich clusters; the latter, however, which are likely to form via a vacancy-driven mechanism, grow in number much faster than the former, helped by the thermodynamic driving force to Cu precipitation in Fe, thereby becoming dominant in the low dose regime. The kinetics of the number density increase of the two populations is thus significantly different. Finally the main conclusion suggested by this work is that the so-called late blooming phases might as well be neither late, nor phases.
An Effective Method of Producing Small Neutral Carbon Clusters
Institute of Scientific and Technical Information of China (English)
XIA Zhu-Hong; CHEN Cheng-Chu; HSU Yen-Chu
2007-01-01
An effective method of producing small neutral carbon clusters Cn (n = 1-6) is described. The small carbon clusters (positive or negative charge or neutral) are formed by plasma which are produced by a high power 532nm pulse laser ablating the surface of the metal Mn rod to react with small hydrocarbons supplied by a pulse valve, then the neutral carbon clusters are extracted and photo-ionized by another laser (266nm or 355nm) in the ionization region of a linear time-of-flight mass spectrometer. The distributions of the initial neutral carbon clusters are analysed with the ionic species appeared in mass spectra. It is observed that the yield of small carbon clusters with the present method is about 10 times than that of the traditional widely used technology of laser vaporization of graphite.
Owen, R. C.; Honrath, R. E.; Merrill, J.
2003-12-01
The use of cluster analysis to group atmospheric trajectories according to similar flow paths has become a common tool in atmospheric studies. Many methods are available to conduct a cluster analysis. However, the dependence of the resulting clusters upon the specific clustering method chosen has not been fully characterized. Specifically, the use of hierarchical versus non-hierarchical clustering algorithms has received little focus. This study presents the results of two cluster analyses: one using the hierarchical clustering algorithm average linkage, and one using the non-hierarchical clustering algorithm k-means. These results demonstrate the sensitivity of this cluster analysis to the use of a hierarchical method versus a non-hierarchical method. In addition, this study analyzes methods for dealing with the vertical component of trajectories during the clustering process. The analyses were performed using a 40-year set of trajectories to the PICO-NARE station, located atop Pico Mountain in the Azores Islands in the central North Atlantic.
Directory of Open Access Journals (Sweden)
D. A. Viattchenin
2009-01-01
Full Text Available A method for constructing a subset of labeled objects which is used in a heuristic algorithm of possible clusterization with partial training is proposed in the paper. The method is based on data preprocessing by the heuristic algorithm of possible clusterization using a transitive closure of a fuzzy tolerance. Method efficiency is demonstrated by way of an illustrative example.
Mathematical structures of loopy belief propagation and cluster variation method
International Nuclear Information System (INIS)
The mathematical structures of loopy belief propagation are reviewed for graphical models in probabilistic information processing in the stand point of cluster variation method. An extension of adaptive TAP approaches is given by introducing a generalized scheme of the cluster variation method. Moreover the practical message update rules in loopy belief propagation are summarized also for quantum systems. It is suggested that the loopy belief propagation can be reformulated for quantum electron systems by using density matrices of ideal quantum lattice gas system.
Relativistic extended coupled cluster method for magnetic hyperfine structure constant
Sasmal, Sudip; Nayak, Malaya K; Vaval, Nayana; Pal, Sourav
2015-01-01
This article deals with the general implementation of 4-component spinor relativistic extended coupled cluster (ECC) method to calculate first order property of atoms and molecules in their open-shell ground state configuration. The implemented relativistic ECC is employed to calculate hyperfine structure (HFS) constant of alkali metals (Li, Na, K, Rb and Cs), singly charged alkaline earth metal atoms (Be+, Mg+, Ca+ and Sr+) and molecules (BeH, MgF and CaH). We have compared our ECC results with the calculations based on restricted active space configuration interaction (RAS-CI) method. Our results are in better agreement with the available experimental values than those of the RAS-CI values.
Gettler Summa, Mireille; Palumbo, Francesco; Tortora, Cristina
2012-01-01
Factorial clustering methods have been developed in recent years thanks to the improving of computational power. These methods perform a linear transformation of data and a clustering on transformed data optimizing a common criterion. Factorial PD-clustering is based on Probabilistic Distance clustering (PD-clustering). PD-clustering is an iterative, distribution free, probabilistic, clustering method. Factor PD-clustering make a linear transformation of original variables into a reduced numb...
Clustering-based redshift estimation: application to VIPERS/CFHTLS
Scottez, V; Granett, B R; Moutard, T; Kilbinger, M; Scodeggio, M; Garilli, B; Bolzonella, M; de la Torre, S; Guzzo, L; Abbas, U; Adami, C; Arnouts, S; Bottini, D; Branchini, E; Cappi, A; Cucciati, O; Davidzon, I; Fritz, A; Franzetti, P; Iovino, A; Krywult, J; Brun, V Le; Fèvre, O Le; Maccagni, D; Małek, K; Marulli, F; Polletta, M; Pollo, A; Tasca, L A M; Tojeiro, R; Vergani, D; Zanichelli, A; Bel, J; Coupon, J; De Lucia, G; Ilbert, O; McCracken, H J; Moscardini, L
2016-01-01
We explore the accuracy of the clustering-based redshift estimation proposed by M\\'enard et al. (2013) when applied to VIPERS and CFHTLS real data. This method enables us to reconstruct redshift distributions from measurement of the angular clus- tering of objects using a set of secure spectroscopic redshifts. We use state of the art spectroscopic measurements with iAB 0.5 which allows us to test the accuracy of the clustering-based red- shift distributions. We show that this method enables us to reproduce the true mean color-redshift relation when both populations have the same magnitude limit. We also show that this technique allows the inference of redshift distributions for a population fainter than the one of reference and we give an estimate of the color-redshift mapping in this case. This last point is of great interest for future large redshift surveys which suffer from the need of a complete faint spectroscopic sample.
Multiway Spectral Clustering: A Margin-Based Perspective
Zhang, Zhihua; Jordan, Michael I.
2008-01-01
Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is "relaxed" into a tractable eigenvector problem, and in which the relaxed solution is subsequently "rounded" into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspect...
International Nuclear Information System (INIS)
We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically
Cluster variation method in the atomic ordering theory
International Nuclear Information System (INIS)
A brief review is presented of the history of the origin, generalization, and the application of one of modern methods for the examination of cooperative phenomena to the theory of atomic ordering. The method has been named ''cluster variation method''. Using a computer, mathematical difficulties have been overcome; and the interest to the cluster variation method has considerarably increased. The results are discussed, which have been obtained by the above method for binary alloys with a face-centered cubic lattice or with space-centered one. Considered is the theory of atomic ordering in ternary alloys according to the type of binary superstructures, L12 and L10. The cluster variation method is applicable to a new model of the alloy, too. The method allows the range of problems to be expanded, which are solved in the statistical theory of atomic ordering
A Study of Sequence Clustering on Protein’s Primary Structure using a Statistical Method
Directory of Open Access Journals (Sweden)
Alina Bogan-Marta
2006-07-01
Full Text Available The clustering of biological sequences into biologically meaningful classesdenotes two computationally complex challenges: the choice of a biologically pertinent andcomputable criterion to evaluate the clusters homogenity, and the optimal exploration ofthe solution space. Here we are analysing the clustering potential of a new method ofsequence similarity based on statistical sequence content evaluation. Applying on the samedata the popular CLUSTAL W method for sequence similarity we contrasted the results.The analysis, computational efficiency and high accuracy of the results from the newmethod is encouraging for further development that could make it an appealing alternativeto the existent methods.
Time series clustering based on nonparametric multidimensional forecast densities
Vilar, José A.; Vilar, Juan M.
2013-01-01
A new time series clustering method based on comparing forecast densities for a sequence of $k>1$ consecutive horizons is proposed. The unknown $k$-dimensional forecast densities can be non-parametrically approximated by using bootstrap procedures that mimic the generating processes without parametric restrictions. However, the difficulty of constructing accurate kernel estimators of multivariate densities is well known. To circumvent the high dimensionality problem, the bootstrap prediction ...
Institute of Scientific and Technical Information of China (English)
张辉; 裘乐淼; 张树有; 胡星星
2013-01-01
Aiming at the problems of enterprise process data and knowledge mining, a method of extracting product typical process routs based on intelligent clustering analysis was presented. A similarity factor between two process routs was established and a multi-level comprehensive measurement method for calculating the similarity between two process routs was proposed. Based on the similarity calculation, a process rout design structure matrix was constructed and the noise reduction processing was applied to the matrix data. To reduce the difficulty and complexity of clustering division, the particle swarm optimization was used to realize the intelligent clustering division of process rout design structure matrix. The typical process routs were extracted from the clustering clusters consequently. A mechanical press enterprise was taken as an example to extract typical process routs from the process data, and the effectiveness of proposed method was verified.%针对企业工艺数据与知识挖掘问题,提出应用智能聚类分析技术提取产品典型工艺路线的方法.构建了工艺路线的相似度度量因子,提出了对工艺路线进行相似度计算的多级相似度综合度量方法,在相似度计算基础上,构建了工艺路线设计结构矩阵,并对矩阵数据进行降噪处理；为降低聚类划分的难度和复杂性,运用粒子群优化算法实现了工艺路线设计结构矩阵的智能聚类划分,并从聚类簇中提取到典型工艺路线.以机械压力机企业工艺数据的典型工艺路线提取为例,验证了该方法的有效性.
The properties of small Ag clusters bound to DNA bases
Soto-Verdugo, Víctor; Metiu, Horia; Gwinn, Elisabeth
2010-05-01
We study the binding of neutral silver clusters, Agn (n=1-6), to the DNA bases adenine (A), cytosine (C), guanine (G), and thymine (T) and the absorption spectra of the silver cluster-base complexes. Using density functional theory (DFT), we find that the clusters prefer to bind to the doubly bonded ring nitrogens and that binding to T is generally much weaker than to C, G, and A. Ag3 and Ag4 make the stronger bonds. Bader charge analysis indicates a mild electron transfer from the base to the clusters for all bases, except T. The donor bases (C, G, and A) bind to the sites on the cluster where the lowest unoccupied molecular orbital has a pronounced protrusion. The site where cluster binds to the base is controlled by the shape of the higher occupied states of the base. Time-dependent DFT calculations show that different base-cluster isomers may have very different absorption spectra. In particular, we find new excitations in base-cluster molecules, at energies well below those of the isolated components, and with strengths that depend strongly on the orientations of planar clusters with respect to the base planes. Our results suggest that geometric constraints on binding, imposed by designed DNA structures, may be a feasible route to engineering the selection of specific cluster-base assemblies.
Web Search Result Clustering based on Cuckoo Search and Consensus Clustering
Alam, Mansaf; Sadaf, Kishwar
2015-01-01
Clustering of web search result document has emerged as a promising tool for improving retrieval performance of an Information Retrieval (IR) system. Search results often plagued by problems like synonymy, polysemy, high volume etc. Clustering other than resolving these problems also provides the user the easiness to locate his/her desired information. In this paper, a method, called WSRDC-CSCC, is introduced to cluster web search result using cuckoo search meta-heuristic method and Consensus...
Graph Clustering Based on Mixing Time of Random Walks
Avrachenkov, Konstantin; El Chamie, Mahmoud; Neglia, Giovanni
2014-01-01
Clustering of a graph is the task of grouping its nodes in such a way that the nodes within the same cluster are well connected, but they are less connected to nodes in different clusters. In this paper we propose a clustering metric based on the random walks' properties to evaluate the quality of a graph clustering. We also propose a randomized algorithm that identifies a locally optimal clustering of the graph according to the metric defined. The algorithm is intrinsically distributed and a...
Sensitivity evaluation of dynamic speckle activity measurements using clustering methods
International Nuclear Information System (INIS)
We evaluate and compare the use of competitive neural networks, self-organizing maps, the expectation-maximization algorithm, K-means, and fuzzy C-means techniques as partitional clustering methods, when the sensitivity of the activity measurement of dynamic speckle images needs to be improved. The temporal history of the acquired intensity generated by each pixel is analyzed in a wavelet decomposition framework, and it is shown that the mean energy of its corresponding wavelet coefficients provides a suited feature space for clustering purposes. The sensitivity obtained by using the evaluated clustering techniques is also compared with the well-known methods of Konishi-Fujii, weighted generalized differences, and wavelet entropy. The performance of the partitional clustering approach is evaluated using simulated dynamic speckle patterns and also experimental data.
A Clustering Method for Weak Signals to Support Anticipative Intelligence
Directory of Open Access Journals (Sweden)
Antonio Leonardo Martins Moreira
2015-01-01
Full Text Available Organizations need appropriate anticipative information to support their decision making process. Contrarily to some strategic information analyses that help managers to establish patterns using past information, anticipative intelligence is intended to help managers to act based on the analysis of pieces of information that indicate some sort of trend that may become true in the future. One example of this kind of information is known as a weak signal, which is a short text related to a specific domain. In this work, pairs of weak signals, written in Portuguese, are compared to each other so that similarities can be identified and correlated weak signals can be clustered together. The idea is that the analysis of the resulting similar groups may lead to the formulation of a hypothesis that can support the decision making process. The proposed technique consists of two main steps: preprocessing the set of weak signals and clustering. The proposed method was evaluated on a database of bio-energy weak signals. The main innovations of this work are: (i the application of a computational methodology from the literature for analyzing anticipative information; and (ii the adaptation of data mining techniques to implement this methodology in a software product.
Method and apparatus for the production of cluster ions
Friedman, L.; Beuhler, R.J.
A method and apparatus for the production of cluster ions, and preferably isotopic hydrogen cluster ions is disclosed. A gas, preferably comprising a carrier gas and a substrate gas, is cooled to about its boiling point and expanded through a supersonic nozzle into a region maintained at a low pressure. Means are provided for the generation of a plasma in the gas before or just as it enters the nozzle.
Brain Tumor Extraction from T1- Weighted MRI using Co-clustering and Level Set Methods
Satheesh, S.; Dr.K.V.S.V.R Prasad; Dr.K.Jitender Reddy
2013-01-01
The aim of the paper is to propose effective technique for tumor extraction from T1-weighted magnetic resonance brain images with combination of co-clustering and level set methods. The co-clustering is the effective region based segmentation technique for the brain tumor extraction but have a drawback at the boundary of tumors. While, the level set without re-initialization which is good edge based segmentation technique but have some drawbacks in providing initial contour. Therefore, in thi...
An Improved Fuzzy c-Means Clustering Algorithm Based on Shadowed Sets and PSO
Directory of Open Access Journals (Sweden)
Jian Zhang
2014-01-01
Full Text Available To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM based on particle swarm optimization (PSO and shadowed sets to perform feature clustering. SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries. This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters. Experiments show that the proposed approach significantly improves the clustering effect.
Modularity-Based Clustering for Network-Constrained Trajectories
EL MAHRSI, Mohamed Khalil; Rossi, Fabrice
2012-01-01
We present a novel clustering approach for moving object trajectories that are constrained by an underlying road network. The approach builds a similarity graph based on these trajectories then uses modularity-optimization hiearchical graph clustering to regroup trajectories with similar profiles. Our experimental study shows the superiority of the proposed approach over classic hierarchical clustering and gives a brief insight to visualization of the clustering results.
Document clustering using graph based document representation with constraints
Rafi, Muhammad; Amin, Farnaz; Shaikh, Mohammad Shahid
2014-01-01
Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and implicitly identifying the patterns, on which this separation is performed, is the challenging part of document clustering. We have proposed a document clustering technique using graph based document representation with constraints. A graph data structure can easi...
Performance Improvement of Cache Management In Cluster Based MANET
Directory of Open Access Journals (Sweden)
Abdulaziz Zam
2013-08-01
Full Text Available Caching is one of the most effective techniques used to improve the data access performance in wireless networks. Accessing data from a remote server imposes high latency and power consumption through forwarding nodes that guide the requests to the server and send data back to the clients. In addition, accessing data may be unreliable or even impossible due to erroneous wireless links and frequently disconnections. Due to the nature of MANET and its high frequent topology changes, and also small cache size and constrained power supply in mobile nodes, the management of the cache would be a challenge. To maintain the MANET’s stability and scalability, clustering is considered as an effective approach. In this paper an efficient cache management method is proposed for the Cluster Based Mobile Ad-hoc NETwork (C-B-MANET. The performance of the method is evaluated in terms of packet delivery ratio, latency and overhead metrics.
An optical imaging method for studying the spatial distribution of argon clusters
International Nuclear Information System (INIS)
An optical imaging method based on Rayleigh scattering is introduced to study the spatial distribution of atomic argon clusters produced in a gas jet. The radial distribution and evolution of the clusters are captured directly by a high speed camera, resulting in greatly increased precision and accuracy. It is found that the radial distribution of the clusters follows a Gaussian curve rather than the double-humped curve observed in a previous experiment. The normalized radial and axial distributions of the clusters are not influenced by the stagnation pressure and may be strictly determined by the nozzle structure. The average cluster sizes decrease slightly at far axial distances. A method of estimating the half-angle of the nozzle is also presented
Method for detecting clusters of possible uranium deposits
International Nuclear Information System (INIS)
When a two-dimensional map contains points that appear to be scattered somewhat at random, a question that often arises is whether groups of points that appear to cluster are merely exhibiting ordinary behavior, which one can expect with any random distribution of points, or whether the clusters are too pronounced to be attributable to chance alone. A method for detecting clusters along a straight line is applied to the two-dimensional map of 214Bi anomalies observed as part of the National Uranium Resource Evaluation Program in the Lubbock, Texas, region. Some exact probabilities associated with this method are computed and compared with two approximate methods. The two methods for approximating probabilities work well in the cases examined and can be used when it is not feasible to obtain the exact probabilities
Geometry optimization of bimetallic clusters using an efficient heuristic method
Lai, Xiangjing; Xu, Ruchu; Huang, Wenqi
2011-10-01
In this paper, an efficient heuristic algorithm for geometry optimization of bimetallic clusters is proposed. The algorithm is mainly composed of three ingredients: the monotonic basin-hopping method with guided perturbation (MBH-GP), surface optimization method, and iterated local search (ILS) method, where MBH-GP and surface optimization method are used to optimize the geometric structure of a cluster, and the ILS method is used to search the optimal homotop for a fixed geometric structure. The proposed method is applied to Cu38-nAun (0 ≤ n ≤ 38), Ag55-nAun (0 ≤ n ≤ 55), and Cu55-nAun (0 ≤ n ≤ 55) clusters modeled by the many-body Gupta potential. Comparison with the results reported in the literature indicates that the present method is highly efficient and a number of new putative global minima missed in the previous papers are found. The present method should be a promising tool for the theoretical determination of ground-state structure of bimetallic clusters. Additionally, some key elements and properties of the present method are also analyzed.
Directory of Open Access Journals (Sweden)
Nasrin Azizi
2012-01-01
Full Text Available In wireless sensor networks, the energy constraint is one of the most important restrictions. With considering this issue, the energy balancing is essential for prolonging the network lifetime. Hence this problem has been considered as a main challenge in the research of scientific communities. In the recent papers many clustering based routing algorithms have been proposed to prolong the network lifetime in wireless sensor networks. But many of them not consider the energy balancing among nodes. In this work we propose the new clustering based routing protocol namely HCTE that cluster head selection mechanism in it is done in two separate stages. So there will be two cluster head in a cluster. The routing algorithm used in proposed protocol is multi hop. Simulation Results show that the HCTE prolongs the network lifetime about 35% compared to the LEACH.
Dynamic access clustering selecting mechanism based on Markov decision process for MANET
Institute of Scientific and Technical Information of China (English)
WANG Dao-yuan; TIAN Hui
2007-01-01
Clustering is an important method in the mobile Ad-hoc network (MANET). As a result of their mobility, the cluster selection is inevitable for the mobile nodes during their roaming between the different clusters. In this study, based on the analysis of the cluster-selecting problem in the environment containing multiple clusters, which are overlaying and intercrossing, a novel dynamic selecting mechanism is proposed to resolve the dynamic selection optimization of roaming between the different clusters in MANET. This selecting mechanism is also based on the consideration of the stability of communication system, the communicating bandwidth, and the effect of cluster selecting on the communication and also in accordance with the Markov decision-making model.
Brain Tumor Extraction from T1- Weighted MRI using Co-clustering and Level Set Methods
Directory of Open Access Journals (Sweden)
S.Satheesh
2013-04-01
Full Text Available The aim of the paper is to propose effective technique for tumor extraction from T1-weighted magnetic resonance brain images with combination of co-clustering and level set methods. The co-clustering is the effective region based segmentation technique for the brain tumor extraction but have a drawback at the boundary of tumors. While, the level set without re-initialization which is good edge based segmentation technique but have some drawbacks in providing initial contour. Therefore, in this paper the region based co-clustering and edge-based level set method are combined through initially extracting tumor using co-clustering and then providing the initial contour to level set method, which help in cancelling the drawbacks of co-clustering and level set method. The data set of five patients, where one slice is selected from each data set is used to analyze the performance of the proposed method. The quality metrics analysis of the proposed method is proved much better as compared to level set without re-initialization method.
A New Method For Galaxy Cluster Detection; 1, The Algorithm
Gladders, M D; Gladders, Michael D.
2000-01-01
Numerous methods for finding clusters at moderate to high redshifts have been proposed in recent years, at wavelengths ranging from radio to X-rays. In this paper we describe a new method for detecting clusters in two-band optical/near-IR imaging data. The method relies upon the observation that all rich clusters, at all redshifts observed so far, appear to have a red sequence of early-type galaxies. The emerging picture is that all rich clusters contain a core population of passively evolving elliptical galaxies which are coeval and formed at high redshifts. The proposed search method exploits this strong empirical fact by using the red sequence as a direct indicator of overdensity. The fundamental advantage of this approach is that with appropriate filters, cluster elliptical galaxies at a given redshift are redder than all normal galaxies at lower redshifts. A simple color cut thus virtually eliminates all foreground contamination, even at significant redshifts. In this paper, one of a series of two, we de...
Galimberti, Giuliano; Manisi, Annamaria; Soffritti, Gabriele
2015-01-01
A general framework for dealing with both linear regression and clustering problems is described. It includes Gaussian clusterwise linear regression analysis with random covariates and cluster analysis via Gaussian mixture models with variable selection. It also admits a novel approach for detecting multiple clusterings from possibly correlated sub-vectors of variables, based on a model defined as the product of conditionally independent Gaussian mixture models. A necessary condition for the ...
Genetic association mapping via evolution-based clustering of haplotypes.
Directory of Open Access Journals (Sweden)
Ioanna Tachmazidou
2007-07-01
Full Text Available Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location
Polygon cluster pattern recognition based on new visual distance
Shuai, Yun; Shuai, Haiyan; Ni, Lin
2007-06-01
The pattern recognition of polygon clusters is a most attention-getting problem in spatial data mining. The paper carries through a research on this problem, based on spatial cognition principle and visual recognition Gestalt principle combining with spatial clustering method, and creates two innovations: First, the paper carries through a great improvement to the concept---"visual distance". In the definition of this concept, not only are Euclid's Distance, orientation difference and dimension discrepancy comprehensively thought out, but also is "similarity degree of object shape" crucially considered. In the calculation of "visual distance", the distance calculation model is built using Delaunay Triangulation geometrical structure. Second, the research adopts spatial clustering analysis based on MST Tree. In the design of pruning algorithm, the study initiates data automatism delamination mechanism and introduces Simulated Annealing Optimization Algorithm. This study provides a new research thread for GIS development, namely, GIS is an intersection principle, whose research method should be open and diverse. Any mature technology of other relative principles can be introduced into the study of GIS, but, they need to be improved on technical measures according to the principles of GIS as "spatial cognition science". Only to do this, can GIS develop forward on a higher and stronger plane.
Covariance analysis of differential drag-based satellite cluster flight
Ben-Yaacov, Ohad; Ivantsov, Anatoly; Gurfil, Pini
2016-06-01
One possibility for satellite cluster flight is to control relative distances using differential drag. The idea is to increase or decrease the drag acceleration on each satellite by changing its attitude, and use the resulting small differential acceleration as a controller. The most significant advantage of the differential drag concept is that it enables cluster flight without consuming fuel. However, any drag-based control algorithm must cope with significant aerodynamical and mechanical uncertainties. The goal of the current paper is to develop a method for examination of the differential drag-based cluster flight performance in the presence of noise and uncertainties. In particular, the differential drag control law is examined under measurement noise, drag uncertainties, and initial condition-related uncertainties. The method used for uncertainty quantification is the Linear Covariance Analysis, which enables us to propagate the augmented state and filter covariance without propagating the state itself. Validation using a Monte-Carlo simulation is provided. The results show that all uncertainties have relatively small effect on the inter-satellite distance, even in the long term, which validates the robustness of the used differential drag controller.
Green Clustering Implementation Based on DPS-MOPSO
Directory of Open Access Journals (Sweden)
Yang Lu
2014-01-01
Full Text Available A green clustering implementation is proposed to be as the first method in the framework of an energy-efficient strategy for centralized enterprise high-density WLANs. Traditionally, to maintain the network coverage, all of the APs within the WLAN have to be powered on. Nevertheless, the new algorithm can power off a large proportion of APs while the coverage is maintained as the always-on counterpart. The proposed algorithm is composed of two parallel and concurrent procedures, which are the faster procedure based on K-means and the more accurate procedure based on Dynamic Population Size Multiple Objective Particle Swarm Optimization (DPS-MOPSO. To implement green clustering efficiently and accurately, dynamic population size and mutational operators are introduced as complements for the classical MOPSO. In addition to the function of AP selection, the new green clustering algorithm has another new function as the reference and guidance for AP deployment. This paper also presents simulations in scenarios modeled with ray-tracing method and FDTD technique, and the results show that about 67% up to 90% of energy consumption can be saved while the original network coverage is maintained during periods when few users are online or when the traffic load is low.
Improved method for the feature extraction of laser scanner using genetic clustering
Institute of Scientific and Technical Information of China (English)
Yu Jinxia; Cai Zixing; Duan Zhuohua
2008-01-01
Feature extraction of range images provided by ranging sensor is a key issue of pattern recognition. To automatically extract the environmental feature sensed by a 2D ranging sensor laser scanner, an improved method based on genetic clustering VGA-clustering is presented. By integrating the spatial neighbouring information of range data into fuzzy clustering algorithm, a weighted fuzzy clustering algorithm (WFCA) instead of standard clustering algorithm is introduced to realize feature extraction of laser scanner. Aimed at the unknown clustering number in advance, several validation index functions are used to estimate the validity of different clustering al-gorithms and one validation index is selected as the fitness function of genetic algorithm so as to determine the accurate clustering number automatically. At the same time, an improved genetic algorithm IVGA on the basis of VGA is proposed to solve the local optimum of clustering algorithm, which is implemented by increasing the population diversity and improving the genetic operators of elitist rule to enhance the local search capacity and to quicken the convergence speed. By the comparison with other algorithms, the effectiveness of the algorithm introduced is demonstrated.
Stenning, D. C.; Wagner-Kaiser, R.; Robinson, E.; van Dyk, D. A.; von Hippel, T.; Sarajedini, A.; Stein, N.
2016-07-01
We develop a Bayesian model for globular clusters composed of multiple stellar populations, extending earlier statistical models for open clusters composed of simple (single) stellar populations. Specifically, we model globular clusters with two populations that differ in helium abundance. Our model assumes a hierarchical structuring of the parameters in which physical properties—age, metallicity, helium abundance, distance, absorption, and initial mass—are common to (i) the cluster as a whole or to (ii) individual populations within a cluster, or are unique to (iii) individual stars. An adaptive Markov chain Monte Carlo (MCMC) algorithm is devised for model fitting that greatly improves convergence relative to its precursor non-adaptive MCMC algorithm. Our model and computational tools are incorporated into an open-source software suite known as BASE-9. We use numerical studies to demonstrate that our method can recover parameters of two-population clusters, and also show how model misspecification can potentially be identified. As a proof of concept, we analyze the two stellar populations of globular cluster NGC 5272 using our model and methods. (BASE-9 is available from GitHub: https://github.com/argiopetech/base/releases).
Accurate method of modeling cluster scaling relations in modified gravity
He, Jian-hua; Li, Baojiu
2016-06-01
We propose a new method to model cluster scaling relations in modified gravity. Using a suite of nonradiative hydrodynamical simulations, we show that the scaling relations of accumulated gas quantities, such as the Sunyaev-Zel'dovich effect (Compton-y parameter) and the x-ray Compton-y parameter, can be accurately predicted using the known results in the Λ CDM model with a precision of ˜3 % . This method provides a reliable way to analyze the gas physics in modified gravity using the less demanding and much more efficient pure cold dark matter simulations. Our results therefore have important theoretical and practical implications in constraining gravity using cluster surveys.
Report of a Workshop on Parallelization of Coupled Cluster Methods
Energy Technology Data Exchange (ETDEWEB)
Rodney J. Bartlett Erik Deumens
2008-05-08
The benchmark, ab initio quantum mechanical methods for molecular structure and spectra are now recognized to be coupled-cluster theory. To benefit from the transiiton to tera- and petascale computers, such coupled-cluster methods must be created to run in a scalable fashion. This Workshop, held as a aprt of the 48th annual Sanibel meeting, at St. Simns, Island, GA, addressed that issue. Representatives of all the principal scientific groups who are addressing this topic were in attendance, to exchange information about the problem and to identify what needs to be done in the future. This report summarized the conclusions of the workshop.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
Energy Technology Data Exchange (ETDEWEB)
Arikidis, N.; Kazantzi, A.; Skiadopoulos, S.; Karahaliou, A.; Costaridou, L., E-mail: costarid@upatras.gr [Department of Medical Physics, School of Medicine, University of Patras, Patras 26504 (Greece); Vassiou, K. [Department of Anatomy, School of Medicine, University of Thessaly, Larissa 41500 (Greece)
2015-10-15
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDIST{sub cluster}, average of minimum distance—AMINDIST{sub cluster}) and the area overlap measure (AOM{sub cluster}). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing
Clustering Analysis on E-commerce Transaction Based on K-means Clustering
Directory of Open Access Journals (Sweden)
Xuan HUANG
2014-02-01
Full Text Available Based on the density, increment and grid etc, shortcomings like the bad elasticity, weak handling ability of high-dimensional data, sensitive to time sequence of data, bad independence of parameters and weak handling ability of noise are usually existed in clustering algorithm when facing a large number of high-dimensional transaction data. Making experiments by sampling data samples of the 300 mobile phones of Taobao, the following conclusions can be obtained: compared with Single-pass clustering algorithm, the K-means clustering algorithm has a high intra-class dissimilarity and inter-class similarity when analyzing e-commerce transaction. In addition, the K-means clustering algorithm has very high efficiency and strong elasticity when dealing with a large number of data items. However, clustering effects of this algorithm are affected by clustering number and initial positions of clustering center. Therefore, it is easy to show the local optimization for clustering results. Therefore, how to determine clustering number and initial positions of the clustering center of this algorithm is still the important job to be researched in the future.
Research and Implementation of Unsupervised Clustering-Based Intrusion Detection
Institute of Scientific and Technical Information of China (English)
Luo Min; Zhang Huan-guo; Wang Li-na
2003-01-01
An unsupervised clustering-based intrusion de tection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal cluster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesnt need labeled training data sets. The experiment concludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.
Clustering in mobile ad hoc network based on neural network
Institute of Scientific and Technical Information of China (English)
CHEN Ai-bin; CAI Zi-xing; HU De-wen
2006-01-01
An on-demand distributed clustering algorithm based on neural network was proposed. The system parameters and the combined weight for each node were computed, and cluster-heads were chosen using the weighted clustering algorithm, then a training set was created and a neural network was trained. In this algorithm, several system parameters were taken into account, such as the ideal node-degree, the transmission power, the mobility and the battery power of the nodes. The algorithm can be used directly to test whether a node is a cluster-head or not. Moreover, the clusters recreation can be speeded up.
Exemplar-Based Clustering via Simulated Annealing
Brusco, Michael J.; Kohn, Hans-Friedrich
2009-01-01
Several authors have touted the p-median model as a plausible alternative to within-cluster sums of squares (i.e., K-means) partitioning. Purported advantages of the p-median model include the provision of "exemplars" as cluster centers, robustness with respect to outliers, and the accommodation of a diverse range of similarity data. We developed…
A cluster-based simulation of facet-based search
Urruty, T.; Hopfgartner, F.; Villa, R.; Gildea, N.; Jose, J.M.
2008-01-01
The recent increase of online video has challenged the research in the field of video information retrieval. Video search engines are becoming more and more interactive, helping the user to easily find what he or she is looking for. In this poster, we present a new approach of using an iterative clustering algorithm on text and visual features to simulate users creating new facets in a facet-based interface. Our experimental results prove the usefulness of such an approach.
Relational visual cluster validity
Ding, Y.; Harrison, R F
2007-01-01
The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to corre...
A COMPARATIVE STUDY TO FIND A SUITABLE METHOD FOR TEXT DOCUMENT CLUSTERING
Directory of Open Access Journals (Sweden)
Dr.M.Punithavalli
2012-01-01
Full Text Available Text mining is used in various text related tasks such as information extraction, concept/entity extraction,document summarization, entity relation modeling (i.e., learning relations between named entities,categorization/classification and clustering. This paper focuses on document clustering, a field of textmining, which groups a set of documents into a list of meaningful categories. The main focus of thispaper is to present a performance analysis of various techniques available for document clustering. Theresults of this comparative study can be used to improve existing text data mining frameworks andimprove the way of knowledge discovery. This paper considers six clustering techniques for documentclustering. The techniques are grouped into three groups namely Group 1 - K-means and its variants(traditional K-means and K* Means algorithms, Group 2 - Expectation Maximization and its variants(traditional EM, Spherical Gaussian EM algorithm and Linear Partitioning and Reallocation clustering(LPR using EM algorithms, Group 3 - Semantic-based techniques (Hybrid method and Feature-basedalgorithms. A total of seven algorithms are considered and were selected based on their popularity inthe text mining field. Several experiments were conducted to analyze the performance of the algorithmand to select the winner in terms of cluster purity, clustering accuracy and speed of clustering.
An Efficient Density based Improved K- Medoids Clustering algorithm
Directory of Open Access Journals (Sweden)
Masthan Mohammed
2012-12-01
Full Text Available Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The database can be clustered in many ways depending on the clustering algorithm employed, parameter settings used, and other factors. Multiple clustering can be combined so that the final partitioning of data provides better clustering. In this paper, an efficient density based k-medoids clustering algorithm has been proposed to overcome the drawbacks of DBSCAN and kmedoids clustering algorithms. Clustering is the process of classifying objects into different groups by partitioning sets of data into a series of subsets called clusters. Clustering has taken its roots from algorithms like k-medoids and k-medoids. However conventional k-medoids clustering algorithm suffers from many limitations. Firstly, it needs to have prior knowledge about the number of cluster parameter k. Secondly, it also initially needs to make random selection of k representative objects and if these initial k medoids are not selected properly then natural cluster may not be obtained. Thirdly, it is also sensitive to the order of input dataset.
Directory of Open Access Journals (Sweden)
Asfandyar Khan
2010-01-01
Full Text Available The objective of this paper is to develop a mechanism to increase the lifetime of homogeneous wireless sensor networks (WSNs through minimizing long range communication, efficient data delivery and energy balancing. Energy efficiency is a very important issue for sensor nodes which affects the lifetime of sensor networks. To achieve energy balancing and maximizing network lifetime we divided the whole network into different clusters. In cluster based architecture, the role of aggregator node is very crucial because of extra processing and long range communication. Once the aggregator node becomes non functional, it affects the whole cluster. We introduced a candidate cluster head node on the basis of node density. We proposed a modified cluster based WSN architecture by introducing a server node (SN that is rich in terms of resources. This server node (SN takes the responsibility of transmitting data to the base station over longer distances from the cluster head. We proposed cluster head selection algorithm based on residual energy, distance, reliability and degree of mobility. The proposed method can save overall energy consumption and extend the lifetime of the sensor network and also addresses robustness against even/uneven node deployment.
A Hierarchical Clustering Based Approach in Aspect Mining
Gabriela Czibula; Grigoreta Sofia Cojocar
2012-01-01
A Hierarchical Clustering Based Approach in Aspect Mining Clustering is a division of data into groups of similar objects. Aspect mining is a process that tries to identify crosscutting concerns in existing software systems. The goal is to refactor the existing systems to use aspect oriented programming, in order to make them easier to maintain and to evolve. The aim of this paper is to present a new hierarchical clustering based approach in aspect mining. For this purpose we propose HAC algo...
Dynamic Clustering of Histogram Data Based on Adaptive Squared Wasserstein Distances
Irpino, Antonio; De Carvalho, Francisco de AT
2011-01-01
This paper deals with clustering methods based on adaptive distances for histogram data using a dynamic clustering algorithm. Histogram data describes individuals in terms of empirical distributions. These kind of data can be considered as complex descriptions of phenomena observed on complex objects: images, groups of individuals, spatial or temporal variant data, results of queries, environmental data, and so on. The Wasserstein distance is used to compare two histograms. The Wasserstein distance between histograms is constituted by two components: the first based on the means, and the second, to internal dispersions (standard deviation, skewness, kurtosis, and so on) of the histograms. To cluster sets of histogram data, we propose to use Dynamic Clustering Algorithm, (based on adaptive squared Wasserstein distances) that is a k-means-like algorithm for clustering a set of individuals into $K$ classes that are apriori fixed. The main aim of this research is to provide a tool for clustering histograms, empha...
Blind signal separation of underdetermined mixtures based on clustering algorithms on planes
Institute of Scientific and Technical Information of China (English)
Xie Shengli; Tan Beihai; Fu Yuli
2007-01-01
Based on clustering method on planes, blind signal separation (BSS) of underdetermined mixtures with three observed signals is discussed. The condition of sufficient sparsity of the source signals is not necessary when clustering method on planes is used. In other words, it needs not that only one source signal plays the main role among others at one time. The proposed method uses normal line clustering of planes first. Then, the mixing matrix can be identified via deciding the intersection lines of the planes. This method is an effective implement of the new theory presented by Georgiev. Simulations illustrate accuracy and restoring capability of the method to estimate the mixing matrix.
Multi-face detection based on downsampling and modified subtractive clustering for color images
Institute of Scientific and Technical Information of China (English)
KONG Wan-zeng; ZHU Shan-an
2007-01-01
This paper presents a multi-face detection method for color images. The method is based on the assumption that faces are well separated from the background by skin color detection. These faces can be located by the proposed method which modifies the subtractive clustering. The modified clustering algorithm proposes a new definition of distance for multi-face detection, and its key parameters can be predetermined adaptively by statistical information of face objects in the image. Downsampling is employed to reduce the computation of clustering and speed up the process of the proposed method. The effectiveness of the proposed method is illustrated by three experiments.
A Novel Density based improved k-means Clustering Algorithm – Dbkmeans
Directory of Open Access Journals (Sweden)
K. Mumtaz
2010-03-01
Full Text Available Mining knowledge from large amounts of spatial data isknown as spatial data mining. It becomes a highly demandingfield because huge amounts of spatial data have been collected invarious applications ranging from geo-spatial data to bio-medicalknowledge. The amount of spatial data being collected isincreasing exponentially. So, it far exceeded human’s ability toanalyze. Recently, clustering has been recognized as a primarydata mining method for knowledge discovery in spatial database.The database can be clustered in many ways depending on theclustering algorithm employed, parameter settings used, andother factors. Multiple clustering can be combined so that thefinal partitioning of data provides better clustering. In this paper,a novel density based k-means clustering algorithm has beenproposed to overcome the drawbacks of DBSCAN and kmeansclustering algorithms. The result will be an improved version of kmeansclustering algorithm. This algorithm will perform betterthan DBSCAN while handling clusters of circularly distributeddata points and slightly overlapped clusters.
Girardi, M; Boschin, W; Ellingson, E
2008-01-01
The connection of cluster mergers with the presence of extended, diffuse radio sources in galaxy clusters is still debated. An interesting case is the rich, merging cluster Abell 520, containing a radio halo. A recent gravitational analysis has shown in this cluster the presence of a massive dark core suggested to be a possible problem for the current cold dark matter paradigm. We aim to obtain new insights into the internal dynamics of Abell 520 analyzing velocities and positions of member galaxies. Our analysis is based on redshift data for 293 galaxies in the cluster field obtained combining new redshift data for 86 galaxies acquired at the TNG with data obtained by CNOC team and other few data from the literature. We also use new photometric data obtained at the INT telescope. We combine galaxy velocities and positions to select 167 cluster members around z~0.201. We analyze the cluster structure using the weighted gap analysis, the KMM method, the Dressler-Shectman statistics and the analysis of the velo...
Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm
Gao, Xin-Hua
2014-02-01
High-precision proper motions and radial velocities of 1046 stars are used to determine member stars using three-dimensional (3D) kinematics for open cluster NGC 188 based on the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. By implementing this algorithm, 472 member stars in the cluster are obtained with 3D kinematics. The color-magnitude diagram (CMD) of the 472 member stars using 3D kinematics shows a well-defined main sequence and a red giant branch, which indicate that the DBSCAN clustering algorithm is very effective for membership determination. The DBSCAN clustering algorithm can effectively select probable member stars in 3D kinematic space without any assumption about the distribution of the cluster or field stars. Analysis results show that the CMD of member stars is significantly clearer than the one based on 2D kinematics, which allows us to better constrain the cluster members and estimate their physical parameters. Using the 472 member stars, the average absolute proper motion and radial velocity are determined to be (PMα, PMδ) = (-2.58 ± 0.22, +0.17 ± 0.18) mas yr-1 and Vr = -42.35 ± 0.05 km s-1, respectively. Our values are in good agreement with values derived by other authors.
Membership determination of open cluster NGC 188 based on the DBSCAN clustering algorithm
International Nuclear Information System (INIS)
High-precision proper motions and radial velocities of 1046 stars are used to determine member stars using three-dimensional (3D) kinematics for open cluster NGC 188 based on the density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm. By implementing this algorithm, 472 member stars in the cluster are obtained with 3D kinematics. The color-magnitude diagram (CMD) of the 472 member stars using 3D kinematics shows a well-defined main sequence and a red giant branch, which indicate that the DBSCAN clustering algorithm is very effective for membership determination. The DBSCAN clustering algorithm can effectively select probable member stars in 3D kinematic space without any assumption about the distribution of the cluster or field stars. Analysis results show that the CMD of member stars is significantly clearer than the one based on 2D kinematics, which allows us to better constrain the cluster members and estimate their physical parameters. Using the 472 member stars, the average absolute proper motion and radial velocity are determined to be (PMα, PMδ) = (−2.58 ± 0.22, +0.17 ± 0.18) mas yr−1 and Vr = −42.35 ± 0.05 km s−1, respectively. Our values are in good agreement with values derived by other authors
Clustering analysis of ancient celadon based on SOM neural network
Institute of Scientific and Technical Information of China (English)
2008-01-01
In the study, chemical compositions of 48 fragments of ancient ceramics excavated in 4 archaeological kiln sites which were located in 3 cities (Hangzhou, Cixi and Longquan in Zhejiang Province, China) have been examined by energy-dispersive X-ray fluorescence (EDXRF) technique. Then the method of SOM was introduced into the clustering analysis based on the major and minor element compositions of the bodies, the results manifested that 48 samples could be perfectly distributed into 3 locations, Hangzhou, Cixi and Longquan. Because the major and minor element compositions of two Royal Kilns were similar to each other, the classification accuracy over them was merely 76.92%. In view of this, the authors have made a SOM clustering analysis again based on the trace element compositions of the bodies, the classification accuracy rose to 84.61%. These results indicated that discrepancies in the trace element compositions of the bodies of the ancient ceramics excavated in two Royal Kiln sites were more distinct than those in the major and minor element compositions, which was in accordance with the fact. We argued that SOM could be employed in the clustering analysis of ancient ceramics.
Henry, David; Dymnicki, Allison B; Mohatt, Nathaniel; Allen, James; Kelly, James G
2015-10-01
Qualitative methods potentially add depth to prevention research but can produce large amounts of complex data even with small samples. Studies conducted with culturally distinct samples often produce voluminous qualitative data but may lack sufficient sample sizes for sophisticated quantitative analysis. Currently lacking in mixed-methods research are methods allowing for more fully integrating qualitative and quantitative analysis techniques. Cluster analysis can be applied to coded qualitative data to clarify the findings of prevention studies by aiding efforts to reveal such things as the motives of participants for their actions and the reasons behind counterintuitive findings. By clustering groups of participants with similar profiles of codes in a quantitative analysis, cluster analysis can serve as a key component in mixed-methods research. This article reports two studies. In the first study, we conduct simulations to test the accuracy of cluster assignment using three different clustering methods with binary data as produced when coding qualitative interviews. Results indicated that hierarchical clustering, K-means clustering, and latent class analysis produced similar levels of accuracy with binary data and that the accuracy of these methods did not decrease with samples as small as 50. Whereas the first study explores the feasibility of using common clustering methods with binary data, the second study provides a "real-world" example using data from a qualitative study of community leadership connected with a drug abuse prevention project. We discuss the implications of this approach for conducting prevention research, especially with small samples and culturally distinct communities. PMID:25946969
Alexandroni, Guy; Zimmerman Moreno, Gali; Sochen, Nir; Greenspan, Hayit
2016-03-01
Recent advances in Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) of white matter in conjunction with improved tractography produce impressive reconstructions of White Matter (WM) pathways. These pathways (fiber sets) often contain hundreds of thousands of fibers, or more. In order to make fiber based analysis more practical, the fiber set needs to be preprocessed to eliminate redundancies and to keep only essential representative fibers. In this paper we demonstrate and compare two distinctive frameworks for selecting this reduced set of fibers. The first framework entails pre-clustering the fibers using k-means, followed by Hierarchical Clustering and replacing each cluster with one representative. For the second clustering stage seven distance metrics were evaluated. The second framework is based on an efficient geometric approximation paradigm named coresets. Coresets present a new approach to optimization and have huge success especially in tasks requiring large computation time and/or memory. We propose a modified version of the coresets algorithm, Density Coreset. It is used for extracting the main fibers from dense datasets, leaving a small set that represents the main structures and connectivity of the brain. A novel approach, based on a 3D indicator structure, is used for comparing the frameworks. This comparison was applied to High Angular Resolution Diffusion Imaging (HARDI) scans of 4 healthy individuals. We show that among the clustering based methods, that cosine distance gives the best performance. In comparing the clustering schemes with coresets, Density Coreset method achieves the best performance.
TB-LMTO method for an embedded cluster
Czech Academy of Sciences Publication Activity Database
Drchal, Václav; Kudrnovský, Josef
2008-01-01
Roč. 88, č. 18 (2008), s. 2777-2786. ISSN 1478-6435 R&D Projects: GA AV ČR IAA100100616 Institutional research plan: CEZ:AV0Z10100520 Keywords : embedded cluster * random alloy * linear muffin -tin orbital method Subject RIV: BM - Solid Matter Physics ; Magnetism Impact factor: 1.384, year: 2008
Quantum Monte Carlo methods and lithium cluster properties
Energy Technology Data Exchange (ETDEWEB)
Owen, R.K.
1990-12-01
Properties of small lithium clusters with sizes ranging from n = 1 to 5 atoms were investigated using quantum Monte Carlo (QMC) methods. Cluster geometries were found from complete active space self consistent field (CASSCF) calculations. A detailed development of the QMC method leading to the variational QMC (V-QMC) and diffusion QMC (D-QMC) methods is shown. The many-body aspect of electron correlation is introduced into the QMC importance sampling electron-electron correlation functions by using density dependent parameters, and are shown to increase the amount of correlation energy obtained in V-QMC calculations. A detailed analysis of D-QMC time-step bias is made and is found to be at least linear with respect to the time-step. The D-QMC calculations determined the lithium cluster ionization potentials to be 0.1982(14) [0.1981], 0.1895(9) [0.1874(4)], 0.1530(34) [0.1599(73)], 0.1664(37) [0.1724(110)], 0.1613(43) [0.1675(110)] Hartrees for lithium clusters n = 1 through 5, respectively; in good agreement with experimental results shown in the brackets. Also, the binding energies per atom was computed to be 0.0177(8) [0.0203(12)], 0.0188(10) [0.0220(21)], 0.0247(8) [0.0310(12)], 0.0253(8) [0.0351(8)] Hartrees for lithium clusters n = 2 through 5, respectively. The lithium cluster one-electron density is shown to have charge concentrations corresponding to nonnuclear attractors. The overall shape of the electronic charge density also bears a remarkable similarity with the anisotropic harmonic oscillator model shape for the given number of valence electrons.
Master equation based steady-state cluster perturbation theory
Nuss, Martin; Dorn, Gerhard; Dorda, Antonius; von der Linden, Wolfgang; Arrigoni, Enrico
2015-09-01
A simple and efficient approximation scheme to study electronic transport characteristics of strongly correlated nanodevices, molecular junctions, or heterostructures out of equilibrium is provided by steady-state cluster perturbation theory. In this work, we improve the starting point of this perturbative, nonequilibrium Green's function based method. Specifically, we employ an improved unperturbed (so-called reference) state ρ̂S, constructed as the steady state of a quantum master equation within the Born-Markov approximation. This resulting hybrid method inherits beneficial aspects of both the quantum master equation as well as the nonequilibrium Green's function technique. We benchmark this scheme on two experimentally relevant systems in the single-electron transistor regime: an electron-electron interaction based quantum diode and a triple quantum dot ring junction, which both feature negative differential conductance. The results of this method improve significantly with respect to the plain quantum master equation treatment at modest additional computational cost.
Remote sensing clustering analysis based on object-based interval modeling
He, Hui; Liang, Tianheng; Hu, Dan; Yu, Xianchuan
2016-09-01
In object-based clustering, image data are segmented into objects (groups of pixels) and then clustered based on the objects' features. This method can be used to automatically classify high-resolution, remote sensing images, but requires accurate descriptions of object features. In this paper, we ascertain that interval-valued data model is appropriate for describing clustering prototype features. With this in mind, we developed an object-based interval modeling method for high-resolution, multiband, remote sensing data. We also designed an adaptive interval-valued fuzzy clustering method. We ran experiments utilizing images from the SPOT-5 satellite sensor, for the Pearl River Delta region and Beijing. The results indicate that the proposed algorithm considers both the anisotropy of the remote sensing data and the ambiguity of objects. Additionally, we present a new dissimilarity measure for interval vectors, which better separates the interval vectors generated by features of the segmentation units (objects). This approach effectively limits classification errors caused by spectral mixing between classes. Compared with the object-based unsupervised classification method proposed earlier, the proposed algorithm improves the classification accuracy without increasing computational complexity.
Functionalization of atomic cobalt clusters obtained by electrochemical methods
Energy Technology Data Exchange (ETDEWEB)
Rodriguez Cobo, Eldara [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain); Rivas Rey, Jose; Blanco Varela, M. Carmen; Lopez Quintela, M. Arturo [Laboratorio de Magnetismo y Tecnologia, Instituto Tecnoloxico, Pabillon de Servicios, Campus Sur, 15782 Santiago de Compostela (Spain); Mourino Mosquera, Antonio; Torneiro Abuin, Mercedes [Departamento de Quimica Organica y Unidad Asociada al CSIC, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain)
2006-05-15
Functionalization of magnetic nanoparticles with appropriate organic molecules is very important for many applications. In the present study, cobalt nanoparticles, with an average diameter of 2 nm corresponding to Co{sub 309} clusters were synthesised by an electrochemical method, and then coated with ADCB (4-(9-deceniloxi)benzoic acid), in order to protect the clusters against oxidation and to obtain a final nanostructure, which can be attached later on to many different materials, like drugs, proteins or some other biological molecules. (copyright 2006 WILEY-VCH Verlag GmbH and Co. KGaA, Weinheim) (orig.)
A Photometric Method for estimating CNO Abundances in Globular Clusters
Peat, David; Peat, David; Butler, Raymond
2002-01-01
Stromgren indices v and b are combined with broad-band index I, and a new index p, the short wavelength half of the v band, to estimate CN 4215A molecular absorption in a sample of stars in M22. The results have been used to estimate carbon and nitrogen abundances and suggest groups of stars within this cluster, each with a characteristic nitrogen abundance, but with a range of carbon abundances. The results suggest the possibility of stars consisting of material which has undergone CNO recycling two or three times. The method can be subsequently used for other globular clusters.
Multiple ellipse fitting by center-based clustering
Tomislav Marošević; Rudolf Scitovski
2015-01-01
This paper deals with the multiple ellipse fitting problem based on a given set of data points in a plane. The presumption is that all data points are derived from k ellipses that should be fitted. The problem is solved by means of center-based clustering, where cluster centers are ellipses. If the Mahalanobis distance-like function is introduced in each cluster, then the cluster center is represented by the corresponding Mahalanobis circle-center. The distance from a point a∈R^2 to the Mahal...
Automatic Clustering Using Teaching Learning Based Optimization
M. Ramakrishna Murty; Anima Naik; J.V.R.Murthy; Prasad Reddy, P. V. G. D.; Suresh C. Satapathy; K. Parvathi
2014-01-01
Finding the optimal number of clusters has remained to be a challenging problem in data mining research community. Several approaches have been suggested which include evolutionary computation techniques like genetic algorithm, particle swarm optimization, differential evolution etc. for addressing this issue. Many variants of the hybridization of these approaches also have been tried by researchers. However, the number of optimal clusters and the computational effic...
WEIGHING GALAXY CLUSTERS WITH GAS. I. ON THE METHODS OF COMPUTING HYDROSTATIC MASS BIAS
International Nuclear Information System (INIS)
Mass estimates of galaxy clusters from X-ray and Sunyeav-Zel'dovich observations assume the intracluster gas is in hydrostatic equilibrium with their gravitational potential. However, since galaxy clusters are dynamically active objects whose dynamical states can deviate significantly from the equilibrium configuration, the departure from the hydrostatic equilibrium assumption is one of the largest sources of systematic uncertainties in cluster cosmology. In the literature there have been two methods for computing the hydrostatic mass bias based on the Euler and the modified Jeans equations, respectively, and there has been some confusion about the validity of these two methods. The word 'Jeans' was a misnomer, which incorrectly implies that the gas is collisionless. To avoid further confusion, we instead refer these methods as 'summation' and 'averaging' methods respectively. In this work, we show that these two methods for computing the hydrostatic mass bias are equivalent by demonstrating that the equation used in the second method can be derived from taking spatial averages of the Euler equation. Specifically, we identify the correspondences of individual terms in these two methods mathematically and show that these correspondences are valid to within a few percent level using hydrodynamical simulations of galaxy cluster formation. In addition, we compute the mass bias associated with the acceleration of gas and show that its contribution is small in the virialized regions in the interior of galaxy clusters, but becomes non-negligible in the outskirts of massive galaxy clusters. We discuss future prospects of understanding and characterizing biases in the mass estimate of galaxy clusters using both hydrodynamical simulations and observations and their implications for cluster cosmology
Data relationship degree-based clustering data aggregation for VANET
Kumar, Rakesh; Dave, Mayank
2016-03-01
Data aggregation is one of the major needs of vehicular ad hoc networks (VANETs) due to the constraints of resources. Data aggregation in VANET can reduce the data redundancy in the process of data gathering and thus conserving the bandwidth. In realistic applications, it is always important to construct an effective route strategy that optimises not only communication cost but also the aggregation cost. Data aggregation at the cluster head by individual vehicle causes flooding of the data, which results in maximum latency and bandwidth consumption. Another approach of data aggregation in VANET is sending local representative data based on spatial correlation of sampled data. In this article, we emphasise on the problem that recent spatial correlation data models of vehicles in VANET are not appropriate for measuring the correlation in a complex and composite environment. Moreover, the data represented by these models is generally inaccurate when compared to the real data. To minimise this problem, we propose a group-based data aggregation method that uses data relationship degree (DRD). In the proposed approach, DRD is a spatial relationship measurement parameter that measures the correlation between a vehicle's data and its neighbouring vehicles' data. The DRD clustering method where grouping of vehicle's data is done based on the available data and its correlation is presented in detail. Results prove that the representative data using proposed approach have a low distortion and provides an improvement in packet delivery ratio and throughput (up to of 10.84% and 24.82% respectively) as compared to the other state-of-the-art solutions like Cluster-Based Accurate Syntactic Compression of Aggregated Data in VANETs.
Method for discovering relationships in data by dynamic quantum clustering
Weinstein, Marvin; Horn, David
2014-10-28
Data clustering is provided according to a dynamical framework based on quantum mechanical time evolution of states corresponding to data points. To expedite computations, we can approximate the time-dependent Hamiltonian formalism by a truncated calculation within a set of Gaussian wave-functions (coherent states) centered around the original points. This allows for analytic evaluation of the time evolution of all such states, opening up the possibility of exploration of relationships among data-points through observation of varying dynamical-distances among points and convergence of points into clusters. This formalism may be further supplemented by preprocessing, such as dimensional reduction through singular value decomposition and/or feature filtering.
A Comparative Analysis of Density Based Clustering Techniques for Outlier Mining
R.Prabahari*,; Dr.V.Thiagarasu
2014-01-01
Density based Clustering Algorithms such as Density Based Spatial Clustering of Applications with Noise (DBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS) and DENsity based CLUstering (DENCLUE) are designed to discover clusters of arbitrary shape. DBSCAN grows clusters according to a density based connectivity analysis. OPTICS, which is an extension of DBSCAN used to produce clusters ordering obtained by setting range of parameter. DENCLUE clusters object ...
Automatic Clustering Approaches Based On Initial Seed Points
Directory of Open Access Journals (Sweden)
G.V.S.N.R.V.Prasad
2011-12-01
Full Text Available Since clustering is applied in many fields, a number of clustering techniques and algorithms have been proposed and are available in the literature. This paper proposes a novel approach to address the major problems in any of the partitional clustering algorithms like choosing appropriate K-value and selection of K-initial seed points. The performance of any partitional clustering algorithms depends oninitial seed points which are random in all the existing partitional clustering algorithms. To overcome this problem, a novel algorithm called Weighted Interior Clustering (WIC algorithm to find approximate initial seed-points, number of clusters and data points in the clusters is proposed in this paper. This paper also proposes another novel approach combining a newly proposed WIC algorithm with K-means named as Weighted Interior K-means Clustering (WIKC. The novelty of this WIKC is that it improves the quality and performance of K-means clustering algorithm with reduced complexity. The experimental results on various datasets, with various instances clearly indicates the efficacy of the proposed methods over the other methods.
Safner, T.; Miller, M.P.; McRae, B.H.; Fortin, M.-J.; Manel, S.
2011-01-01
Recently, techniques available for identifying clusters of individuals or boundaries between clusters using genetic data from natural populations have expanded rapidly. Consequently, there is a need to evaluate these different techniques. We used spatially-explicit simulation models to compare three spatial Bayesian clustering programs and two edge detection methods. Spatially-structured populations were simulated where a continuous population was subdivided by barriers. We evaluated the ability of each method to correctly identify boundary locations while varying: (i) time after divergence, (ii) strength of isolation by distance, (iii) level of genetic diversity, and (iv) amount of gene flow across barriers. To further evaluate the methods' effectiveness to detect genetic clusters in natural populations, we used previously published data on North American pumas and a European shrub. Our results show that with simulated and empirical data, the Bayesian spatial clustering algorithms outperformed direct edge detection methods. All methods incorrectly detected boundaries in the presence of strong patterns of isolation by distance. Based on this finding, we support the application of Bayesian spatial clustering algorithms for boundary detection in empirical datasets, with necessary tests for the influence of isolation by distance. ?? 2011 by the authors; licensee MDPI, Basel, Switzerland.
Multiple imputation methods for bivariate outcomes in cluster randomised trials.
DiazOrdaz, K; Kenward, M G; Gomes, M; Grieve, R
2016-09-10
Missing observations are common in cluster randomised trials. The problem is exacerbated when modelling bivariate outcomes jointly, as the proportion of complete cases is often considerably smaller than the proportion having either of the outcomes fully observed. Approaches taken to handling such missing data include the following: complete case analysis, single-level multiple imputation that ignores the clustering, multiple imputation with a fixed effect for each cluster and multilevel multiple imputation. We contrasted the alternative approaches to handling missing data in a cost-effectiveness analysis that uses data from a cluster randomised trial to evaluate an exercise intervention for care home residents. We then conducted a simulation study to assess the performance of these approaches on bivariate continuous outcomes, in terms of confidence interval coverage and empirical bias in the estimated treatment effects. Missing-at-random clustered data scenarios were simulated following a full-factorial design. Across all the missing data mechanisms considered, the multiple imputation methods provided estimators with negligible bias, while complete case analysis resulted in biased treatment effect estimates in scenarios where the randomised treatment arm was associated with missingness. Confidence interval coverage was generally in excess of nominal levels (up to 99.8%) following fixed-effects multiple imputation and too low following single-level multiple imputation. Multilevel multiple imputation led to coverage levels of approximately 95% throughout. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. PMID:26990655
Miyamoto, S.; Nakayama, K.
1983-01-01
A method of two-stage clustering of literature based on citation frequency is applied to 5,065 articles from 57 journals in environmental and civil engineering. Results of related methods of citation analysis (hierarchical graph, clustering of journals, multidimensional scaling) applied to same set of articles are compared. Ten references are…
A Systematic Analysis of Caustic Methods for Galaxy Cluster Masses
Gifford, Daniel; Kern, Nicholas
2013-01-01
We quantify the expected observed statistical and systematic uncertainties of the escape velocity as a measure of the gravitational potential and total mass of galaxy clusters. We focus our attention on low redshift (z 25, the scatter in the escape velocity mass is dominated by projections along the line-of-sight. Algorithmic uncertainties from the determination of the projected escape velocity profile are negligible. We quantify how target selection based on magnitude, color, and projected radial separation can induce small additional biases into the escape velocity masses. Using N_gal = 150 (25), the caustic technique has a per cluster scatter in ln(M|M_200) of 0.3 (0.5) and bias 1+/-3% (16+/-5%) for clusters with masses > 10^14M_solar at z<0.15.
Graph-based k-means clustering: A comparison of the set versus the generalized median graph
Ferrer, Miquel; Valveny, Ernest; Serratosa, Francesc; Bardaji, I.; Bunke, Horst
2009-01-01
In this paper we propose the application of the generalized median graph in a graph-based k-means clustering algorithm. In the graph-based k-means algorithm, the centers of the clusters have been traditionally represented using the set median graph. We propose an approximate method for the generalized median graph computation that allows to use it to represent the centers of the clusters. Experiments on three databases show that using the generalized median graph as the clusters representativ...
LBSN user movement trajectory clustering mining method based on road network%基于路网的LBSN用户移动轨迹聚类挖掘方法
Institute of Scientific and Technical Information of China (English)
邹永贵; 万建斌; 夏英
2013-01-01
The data in LBSN (location-based social network) have geographical and social attribute.It is helpful to improve the efficiency of uncertainly trajectory clustering mining combined with user' s trajectories and friendship.This paper presented a ranking function based on the feature of friends relationship to sort user' s effect and find the active users.Meanwhile,it introduced accuracy detection of the road network sub-trajectories to the process of network matching based on data reduction.Moreover,it stored the active users' correct matching ways to reduce the time complexity.Finally,it mined hot routes within the city by taking into account both R* tree spatial index mechanism and DBSCAN clustering algorithm.Theoretical analysis and experiment results show that compared to the existing method,the method has better stretchability,can get clustering result more accurately and efficiently in the LBSN environment.%基于LBSN(基于位置的社交网络)中数据的地理和社交属性,结合用户轨迹和好友关系,有助于提高不确定轨迹聚类挖掘的效率.根据LBSN用户的好友关系特征,引入评分函数,对用户影响力进行排序,找出其中的活跃用户；在传统路网子轨迹匹配和对签到数据清理的基础上,加入子轨迹匹配准确性监测,并存储活跃用户匹配成功的路段,进而减少路网匹配时间.最后综合R*树的空间索引机制和DBSCAN聚类算法对城市内的热点路径进行挖掘.理论分析和实验表明,相比于已有方法,改进的的聚类挖掘方法在LBSN环境中的时间效率和准确性都有较大的提高,且有较好的可伸缩性.
Density-based algorithms for active and anytime clustering
Mai, Son
2014-01-01
Data intensive applications like biology, medicine, and neuroscience require effective and efficient data mining technologies. Advanced data acquisition methods produce a constantly increasing volume and complexity. As a consequence, the need of new data mining technologies to deal with complex data has emerged during the last decades. In this thesis, we focus on the data mining task of clustering in which objects are separated in different groups (clusters) such that objects inside a cluster...
An adaptive spatial clustering method for automatic brain MR image segmentation
Institute of Scientific and Technical Information of China (English)
Jingdan Zhang; Daoqing Dai
2009-01-01
In this paper, an adaptive spatial clustering method is presented for automatic brain MR image segmentation, which is based on a competitive learning algorithm-self-organizing map (SOM). We use a pattern recognition approach in terms of feature generation and classifier design. Firstly, a multi-dimensional feature vector is constructed using local spatial information. Then, an adaptive spatial growing hierarchical SOM (ASGHSOM) is proposed as the classifier, which is an extension of SOM, fusing multi-scale segmentation with the competitive learning clustering algorithm to overcome the problem of overlapping grey-scale intensities on boundary regions. Furthermore, an adaptive spatial distance is integrated with ASGHSOM, in which local spatial information is considered in the cluster-ing process to reduce the noise effect and the classification ambiguity. Our proposed method is validated by extensive experiments using both simulated and real MR data with varying noise level, and is compared with the state-of-the-art algorithms.
Institute of Scientific and Technical Information of China (English)
宋智玲; 贾小珠
2011-01-01
How to discover communities automatically has great significance in the study of structure, function and behavior of complex network. Based on ant colony algorithm to optimize the node computational performance, a new method of identifying similar nodes is provided by using clustering technology.%如何在复杂网络中自动地发现社团,对于研究复杂网络的结构、功能和行为有着非常重要的意义.在聚类技术的基础上,提出了一种基于蚁群算法识别相似结点的方法,以优化结点的计算性能.
Diet-related chronic disease in the northeastern United States: a model-based clustering approach
Flynt, Abby; Daepp, Madeleine I. G.
2015-01-01
Background Obesity and diabetes are global public health concerns. Studies indicate a relationship between socioeconomic, demographic and environmental variables and the spatial patterns of diet-related chronic disease. In this paper, we propose a methodology using model-based clustering and variable selection to predict rates of obesity and diabetes. We test this method through an application in the northeastern United States. Methods We use model-based clustering, an unsupervised learning a...
Superoperator coupled cluster method for nonequilibrium density matrix
Dzhioev, Alan A.; Kosov, D. S.
2014-01-01
We develop a superoperator coupled cluster method for nonequilibrium open many-body quantum systems described by the Lindblad master equation. The method is universal and applicable to systems of interacting fermions, bosons or their mixtures. We present a general theory and consider its application to the problem of quantum transport through the system with electron-phonon correlations. The results are assessed against the perturbation theory and nonequilibrium configuration interaction theo...
Clustering based on Random Graph Model embedding Vertex Features
Zanghi, Hugo; Volant, Stevenn; Ambroise, Christophe
2009-01-01
Large datasets with interactions between objects are common to numerous scientific fields (i.e. social science, internet, biology...). The interactions naturally define a graph and a common way to explore or summarize such dataset is graph clustering. Most techniques for clustering graph vertices just use the topology of connections ignoring informations in the vertices features. In this paper, we provide a clustering algorithm exploiting both types of data based on a statistical model with l...
A Cluster Based Approach for Classification of Web Results
Apeksha Khabia; M. B. Chandak
2014-01-01
Nowadays significant amount of information from web is present in the form of text, e.g., reviews, forum postings, blogs, news articles, email messages, web pages. It becomes difficult to classify documents in predefined categories as the number of document grows. Clustering is the classification of a data into clusters, so that the data in each cluster share some common trait – often vicinity according to some defined measure. Underlying distribution of data set can somewhat be depicted base...
Cluster detection of diseases in heterogeneous populations: an alternative to scan methods
Directory of Open Access Journals (Sweden)
Rebeca Ramis
2014-05-01
Full Text Available Cluster detection has become an important part of the agenda of epidemiologists and public health authorities, the identification of high- and low-risk areas is fundamental in the definition of public health strategies and in the suggestion of potential risks factors. Currently, there are different cluster detection techniques available, the most popular being those using windows to scan the areas within the studied region. However, when these areas are heterogeneous in populations’ sizes, scan window methods can lead to inaccurate conclusions. In order to perform cluster detection over heterogeneously populated areas, we developed a method not based on scanning windows but instead on standard mortality ratios (SMR using irregular spatial aggregation (ISA. Its extension, i.e. irregular spatial aggregation with covariates (ISAC, includes covariates with residuals from Poisson regression. We compared the performance of the method with the flexible shaped spatial scan statistic (FlexScan using mortality data for stomach and bladder cancer for 8,098 Spanish towns. The results show a collection of clusters for stomach and bladder cancer similar to that detected by ISA and FlexScan. However, in general, clusters detected by FlexScan were bigger and include towns with SMR, which were not statistically significant. For bladder cancer, clusters detected by ISAC differed from those detected by ISA and FlexScan in shape and location. The ISA and ISAC methods could be an alternative to the traditional scan window methods for cluster detection over aggregated data when the areas under study are heterogeneous in terms of population. The simplicity and flexibility of the methods make them more attractive than methods based on more complicated algorithms.
Cluster detection of diseases in heterogeneous populations: an alternative to scan methods.
Ramis, Rebeca; Gómez-Barroso, Diana; López-Abente, Gonzalo
2014-05-01
Cluster detection has become an important part of the agenda of epidemiologists and public health authorities, the identification of high- and low-risk areas is fundamental in the definition of public health strategies and in the suggestion of potential risks factors. Currently, there are different cluster detection techniques available, the most popular being those using windows to scan the areas within the studied region. However, when these areas are heterogeneous in populations' sizes, scan window methods can lead to inaccurate conclusions. In order to perform cluster detection over heterogeneously populated areas, we developed a method not based on scanning windows but instead on standard mortality ratios (SMR) using irregular spatial aggregation (ISA). Its extension, i.e. irregular spatial aggregation with covariates (ISAC), includes covariates with residuals from Poisson regression. We compared the performance of the method with the flexible shaped spatial scan statistic (FlexScan) using mortality data for stomach and bladder cancer for 8,098 Spanish towns. The results show a collection of clusters for stomach and bladder cancer similar to that detected by ISA and FlexScan. However, in general, clusters detected by FlexScan were bigger and include towns with SMR, which were not statistically significant. For bladder cancer, clusters detected by ISAC differed from those detected by ISA and FlexScan in shape and location. The ISA and ISAC methods could be an alternative to the traditional scan window methods for cluster detection over aggregated data when the areas under study are heterogeneous in terms of population. The simplicity and flexibility of the methods make them more attractive than methods based on more complicated algorithms. PMID:24893029
Cloud Computing Application for Hotspot Clustering Using Recursive Density Based Clustering (RDBC)
Santoso, Aries; Khiyarin Nisa, Karlina
2016-01-01
Indonesia has vast areas of tropical forest, but are often burned which causes extensive damage to property and human life. Monitoring hotspots can be one of the forest fire management. Each hotspot is recorded in dataset so that it can be processed and analyzed. This research aims to build a cloud computing application which visualizes hotspots clustering. This application uses the R programming language with Shiny web framework and implements Recursive Density Based Clustering (RDBC) algorithm. Clustering is done on hotspot dataset of the Kalimantan Island and South Sumatra Province to find the spread pattern of hotspots. The clustering results are evaluated using the Silhouette's Coefficient (SC) which yield best value 0.3220798 for Kalimantan dataset. Clustering pattern are displayed in the form of web pages so that it can be widely accessed and become the reference for fire occurrence prediction.
Institute of Scientific and Technical Information of China (English)
李昌玺; 郭戈; 汪毅; 赵龙华; 张晨
2015-01-01
针对弹道单目标识别中，传感器资源利用率低、识别耗时长的缺点，采用群目标理论，提出了基于特征敏感度的弹道群目标聚类识别方法。该方法首先对目标群进行聚类，然后选取目标特征组合对各个分群进行敏感度计算，确立各分群的威胁度。同时，通过比较某个特定阶段各类特征的优劣性，为特征优化组合提供指导意见，优化了目标特征组合，提高了识别效率。最后，仿真实验验证了方法的可行性。%Aiming at low utilization rate and time-consuming identification of sensor resources of single ballistic target recognition,this paper puts forward the ballistic target group clustering recogni-tion method based on feature sensitivity .The keys to this method include clustering the target group and forming some small groups firstly,and then selecting the target feature combination to calculate the sensitivity for each small group,at the last establishing threat degree.At the same time,by com-paring the advantages and disadvantages of a particular stage,this paper brings out guiding opinions on feature optimization and combination,optimizes the feature combination and improves the efficien-cy of identification.In the end,the feasibility of the method is proven by simulation results.
A fast density-based clustering algorithm for real-time Internet of Things stream.
Amini, Amineh; Saboohi, Hadi; Wah, Teh Ying; Herawan, Tutut
2014-01-01
Data streams are continuously generated over time from Internet of Things (IoT) devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets. PMID:25110753
A Fast Density-Based Clustering Algorithm for Real-Time Internet of Things Stream
Directory of Open Access Journals (Sweden)
Amineh Amini
2014-01-01
Full Text Available Data streams are continuously generated over time from Internet of Things (IoT devices. The faster all of this data is analyzed, its hidden trends and patterns discovered, and new strategies created, the faster action can be taken, creating greater value for organizations. Density-based method is a prominent class in clustering data streams. It has the ability to detect arbitrary shape clusters, to handle outlier, and it does not need the number of clusters in advance. Therefore, density-based clustering algorithm is a proper choice for clustering IoT streams. Recently, several density-based algorithms have been proposed for clustering data streams. However, density-based clustering in limited time is still a challenging issue. In this paper, we propose a density-based clustering algorithm for IoT streams. The method has fast processing time to be applicable in real-time application of IoT devices. Experimental results show that the proposed approach obtains high quality results with low computation time on real and synthetic datasets.
A method of clustering observers with different visual characteristics
International Nuclear Information System (INIS)
Evaluation of observer's image perception in medical images is important, and yet has not been performed because it is difficult to quantify visual characteristics. In the present study, we investigated the observer's image perception by clustering a group of 20 observers. Images of a contrast-detail (C-D) phantom, which had cylinders of 10 rows and 10 columns with different diameters and lengths, were acquired with an X-ray screen-film system with fixed exposure conditions. A group of 10 films were prepared for visual evaluations. Sixteen radiological technicians, three radiologists and one medical physicist participated in the observation test. All observers read the phantom radiographs on a transillumination image viewer with room lights off. The detectability was defined as the shortest length of the cylinders of which border the observers could recognize from the background, and was recorded using the number of columns. The detectability was calculated as the average of 10 readings for each observer, and plotted for different phantom diameter. The unweighted pair-group method using arithmetic averages (UPGMA) was adopted for clustering. The observers were clustered into two groups: one group selected objects with a demarcation from the vicinity, and the other group searched for the objects with their eyes constrained. This study showed a usefulness of the cluster method to select personnel with the similar perceptual predisposition when a C-D phantom was used in image quality control
An Efficient Semantic Model For Concept Based Clustering And Classification
Directory of Open Access Journals (Sweden)
SaiSindhu Bandaru
2012-03-01
Full Text Available Usually in text mining techniques the basic measures like term frequency of a term (word or phrase is computed to compute the importance of the term in the document. But with statistical analysis, the original semantics of the term may not carry the exact meaning of the term. To overcome this problem, a new framework has been introduced which relies on concept based model and synonym based approach. The proposed model can efficiently find significant matching and related concepts between documents according to concept based and synonym based approaches. Large sets of experiments using the proposed model on different set in clustering and classification are conducted. Experimental results demonstrate the substantialenhancement of the clustering quality using sentence based, document based, corpus based and combined approach concept analysis. A new similarity measure has been proposed to find the similarity between adocument and the existing clusters, which can be used in classification of the document with existing clusters.
APPECT: An Approximate Backbone-Based Clustering Algorithm for Tags
DEFF Research Database (Denmark)
Zong, Yu; Xu, Guandong; Jin, Pin;
2011-01-01
resulting from the severe difficulty of ambiguity, redundancy and less semantic nature of tags. Clustering method is a useful tool to address the aforementioned difficulties. Most of the researches on tag clustering are directly using traditional clustering algorithms such as K-means or Hierarchical...... algorithm for Tags (APPECT). The main steps of APPECT are: (1) we execute the K-means algorithm on a tag similarity matrix for M times and collect a set of tag clustering results Z={C1,C2,…,Cm}; (2) we form the approximate backbone of Z by executing a greedy search; (3) we fix the approximate backbone as...
Recognition of Marrow Cell Images Based on Fuzzy Clustering
Directory of Open Access Journals (Sweden)
Xitao Zheng
2012-02-01
Full Text Available In order to explore the leukocyte distribution of human being to predict the recurrent leukemia, the mouse marrow cells are investigated to get the possible indication of the recurrence. This paper uses the C-mean fuzzy clustering recognition method to identify cells from sliced mouse marrow image. In our image processing, red cells, leukocytes, megakaryocyte, and cytoplasm can not be separated by their staining color, RGB combinations are used to classify the image into 8 sectors so that the searching area can be matched with these sectors. The gray value distribution and the texture patterns are used to construct membership function. Previous work on this project involves the recognition using pixel distribution and probability lays the background of data processing and preprocessing. Constraints based on size, pixel distribution, and grayscale pattern are used for the successful counting of individual cells. Tests show that this shape, pattern and color based method can reach satisfied counting under similar illumination condition.
Research and Implementation of Unsupervised Clustering-Based Intrusion Detection
Institute of Scientific and Technical Information of China (English)
LuoMin; ZhangHuan-guo; WangLi-na
2003-01-01
An unsupervised clustering-based intrusion detection algorithm is discussed in this paper. The basic idea of the algorithm is to produce the cluster by comparing the distances of unlabeled training data sets. With the classified data instances, anomaly data clusters can be easily identified by normal duster ratio and the identified cluster can be used in real data detection. The benefit of the algorithm is that it doesn't need labeled training data sets. The experiment coneludes that this approach can detect unknown intrusions efficiently in the real network connections via using the data sets of KDD99.
Cluster Based Topology Control in Dynamic Mobile Ad Hoc Networks
Directory of Open Access Journals (Sweden)
T. Parameswaran
2014-05-01
Full Text Available In Mobile Ad hoc NETworks (MANETs, mobility of nodes, resource constraints and selfish behavior of nodes are important factors which may degrade the performance. Clustering is an effective scheme to improve the performance of MANETs features such as scalability, reliability, and stability. Each cluster member (CM is associated with only one cluster head (CH and can communicate with the CH by single hop communication. Mobility information is used by many existing clustering schemes such as weighted clustering algorithm (WCA Link expiration time prediction scheme and k-hop compound metric based clustering. In scheme 1 the CH election is based on a weighted sum of four different parameters such as node status, neighbor’s distribution, mobility, and remaining energy which brings flexibility but weight factor for each parameter if difficult. In scheme 2 lifetime of a wireless link between a node pair is predicted by GPS location information. In scheme 3 the predicted mobility parameter is combined with the connectivity to create a new compound metric for CH election. Despite various efforts in mobility clustering, not much work has been done specifically for high mobility nodes. Our proposed solution provides secure CH election and incentives to encourage nodes to honestly participating in election process. Mobility strategies are used to handle the various problems caused by node movements such as association losses to current CHs and CH role changes, for extending the connection lifetime and provide more stable clusters. The conducted simulation results shows that the proposed approach outperforms the existing clustering schemes.
Finite mixture models and model-based clustering
Directory of Open Access Journals (Sweden)
Volodymyr Melnykov
2010-01-01
Full Text Available Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed.
Likelihood-based inference for clustered line transect data
DEFF Research Database (Denmark)
Waagepetersen, Rasmus; Schweder, Tore
2006-01-01
The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...
Likelihood-based inference for clustered line transect data
DEFF Research Database (Denmark)
Waagepetersen, Rasmus Plenge; Schweder, Tore
The uncertainty in estimation of spatial animal density from line transect surveys depends on the degree of spatial clustering in the animal population. To quantify the clustering we model line transect data as independent thinnings of spatial shot-noise Cox processes. Likelihood-based inference...
Correlation Preserved Indexing Based Approach For Document Clustering
Directory of Open Access Journals (Sweden)
Meena.S.U, P.Parthasarathi
2013-02-01
Full Text Available Document clustering is the act of collecting similar documents into clusters, where similarity is some function on a document. Document clustering method achieves 1 a high accuracy for documents 2 document frequency can be calculated 3 term weight is calculated with the term frequency vector. Document clustering is closely related to the concept of data clustering. Document clustering is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories. The correlation preserving indexing method is performed to find the correlation between the documents. The Term Frequency-Inverse Document Frequency (TF-IDF method is used to find the frequency of occurrence of words in each document. The disadvantage of this method is computation complexity. In this paper Significant Score Calculation method is introduced, where similarity between the words are calculated using word net tool. Here the related words are identified. The 98% accuracy is occurred with significant score calculation for finding correlation preserving indexing.
Perceptual Object Extraction Based on Saliency and Clustering
Directory of Open Access Journals (Sweden)
Qiaorong Zhang
2010-08-01
Full Text Available Object-based visual attention has received an increasing interest in recent years. Perceptual object is the basic attention unit of object-based visual attention. The definition and extraction of perceptual objects is one of the key technologies in object-based visual attention computation model. A novel perceptual object definition and extraction method is proposed in this paper. Based on Gestalt theory and visual feature integration theory, perceptual object is defined using homogeneity region, salient region and edges. An improved saliency map generating algorithm is employed first. Based on the saliency map, salient edges are extracted. Then graph-based clustering algorithm is introduced to get homogeneity regions in the image. Finally an integration strategy is adopted to combine salient edges and homogeneity regions to extract perceptual objects. The proposed perceptual object extraction method has been tested on lots of natural images. Experiment results and analysis are presented in this paper also. Experiment results show that the proposed method is reasonable and valid.
A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset
Directory of Open Access Journals (Sweden)
K. R.S. Kumar
2012-01-01
Full Text Available Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, sensor network, which relies on dynamic data clustering algorithms. Approach: In this work, we present a density based dynamic data clustering algorithm for clustering incremental dataset and compare its performance with full run of normal DBSCAN, Chameleon on the dynamic dataset. Most of the clustering algorithms perform well and will give ideal performance with good accuracy measured with clustering accuracy, which is calculated using the original class labels and the calculated class labels. However, if we measure the performance with a cluster validation metric, then it will give another kind of result. Results: This study addresses the problems of clustering a dynamic dataset in which the data set is increasing in size over time by adding more and more data. So to evaluate the performance of the algorithms, we used Generalized Dunn Index (GDI, Davies-Bouldin index (DB as the cluster validation metric and as well as time taken for clustering. Conclusion: In this study, we have successfully implemented and evaluated the proposed density based dynamic clustering algorithm. The performance of the algorithm was compared with Chameleon and DBSCAN clustering algorithms. The proposed algorithm performed significantly well in terms of clustering accuracy as well as speed.
Zhang, Zhaoyang; Fang, Hua; Wang, Honggang
2016-06-01
Web-delivered trials are an important component in eHealth services. These trials, mostly behavior-based, generate big heterogeneous data that are longitudinal, high dimensional with missing values. Unsupervised learning methods have been widely applied in this area, however, validating the optimal number of clusters has been challenging. Built upon our multiple imputation (MI) based fuzzy clustering, MIfuzzy, we proposed a new multiple imputation based validation (MIV) framework and corresponding MIV algorithms for clustering big longitudinal eHealth data with missing values, more generally for fuzzy-logic based clustering methods. Specifically, we detect the optimal number of clusters by auto-searching and -synthesizing a suite of MI-based validation methods and indices, including conventional (bootstrap or cross-validation based) and emerging (modularity-based) validation indices for general clustering methods as well as the specific one (Xie and Beni) for fuzzy clustering. The MIV performance was demonstrated on a big longitudinal dataset from a real web-delivered trial and using simulation. The results indicate MI-based Xie and Beni index for fuzzy-clustering are more appropriate for detecting the optimal number of clusters for such complex data. The MIV concept and algorithms could be easily adapted to different types of clustering that could process big incomplete longitudinal trial data in eHealth services. PMID:27126063
International Nuclear Information System (INIS)
The scalar-product cluster variation method (SP-CVM) for calculation of antiphase boundary (APB) energies has been extended for application to {001} APBs in the body-centered cubic lattice using the irregular tetrahedron cluster approximation. In order to do so, the proof of the SP-CVM relation has been updated to include the cases where the domain interface consists of more than one plane of atoms (i.e., it consists of a layer of atoms). The algorithm has been developed for the cases of thermal APBs with APB vectors a0A2 and a0A2/2. It is then applied, as an illustration, to the determination of APB energies in isothermal calculations in the Fe-Al system
Directory of Open Access Journals (Sweden)
Ramachandra Rao Kurada
2013-10-01
Full Text Available The present survey provides the state-of-the-art of research, copiously devoted to Evolutionary Approach(EAs for clustering exemplified with a diversity of evolutionary computations. The Survey provides anomenclature that highlights some aspects that are very important in the context of evolutionary dataclustering. The paper missions the clustering trade-offs branched out with wide-ranging Multi ObjectiveEvolutionary Approaches (MOEAs methods. Finally, this study addresses the potential challenges ofMOEA design and data clustering, along with conclusions and recommendations for novice andresearchers by positioning most promising paths of future research.MOEAs have substantial success across a variety of MOP applications, from pedagogical multifunctionoptimization to real-world engineering design. The survey paper noticeably organizes the developmentswitnessed in the past three decades for EAs based metaheuristics to solve multiobjective optimizationproblems (MOP and to derive significant progression in ruling high quality elucidations in a single run.Data clustering is an exigent task, whose intricacy is caused by a lack of unique and precise definition of acluster. The discrete optimization problem uses the cluster space to derive a solution for Multiobjectivedata clustering. Discovery of a majority or all of the clusters (of illogical shapes present in the data is along-standing goal of unsupervised predictive learning problems or exploratory pattern analysis.
An Ontology-based Knowledge Management System for Industry Clusters
Sureephong, Pradorn; Ouzrout, Yacine; Bouras, Abdelaziz
2008-01-01
Knowledge-based economy forces companies in the nation to group together as a cluster in order to maintain their competitiveness in the world market. The cluster development relies on two key success factors which are knowledge sharing and collaboration between the actors in the cluster. Thus, our study tries to propose knowledge management system to support knowledge management activities within the cluster. To achieve the objectives of this study, ontology takes a very important role in knowledge management process in various ways; such as building reusable and faster knowledge-bases, better way for representing the knowledge explicitly. However, creating and representing ontology create difficulties to organization due to the ambiguity and unstructured of source of knowledge. Therefore, the objectives of this paper are to propose the methodology to create and represent ontology for the organization development by using knowledge engineering approach. The handicraft cluster in Thailand is used as a case stu...
Fuzzy clustering-based segmented attenuation correction in whole-body PET
Zaidi, H; Boudraa, A; Slosman, DO
2001-01-01
Segmented-based attenuation correction is now a widely accepted technique to reduce noise contribution of measured attenuation correction. In this paper, we present a new method for segmenting transmission images in positron emission tomography. This reduces the noise on the correction maps while still correcting for differing attenuation coefficients of specific tissues. Based on the Fuzzy C-Means (FCM) algorithm, the method segments the PET transmission images into a given number of clusters to extract specific areas of differing attenuation such as air, the lungs and soft tissue, preceded by a median filtering procedure. The reconstructed transmission image voxels are therefore segmented into populations of uniform attenuation based on the human anatomy. The clustering procedure starts with an over-specified number of clusters followed by a merging process to group clusters with similar properties and remove some undesired substructures using anatomical knowledge. The method is unsupervised, adaptive and a...
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET
Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET.
Aadil, Farhan; Bajwa, Khalid Bashir; Khan, Salabat; Chaudary, Nadeem Majeed; Akram, Adeel
2016-01-01
A vehicular ad hoc network (VANET) is a wirelessly connected network of vehicular nodes. A number of techniques, such as message ferrying, data aggregation, and vehicular node clustering aim to improve communication efficiency in VANETs. Cluster heads (CHs), selected in the process of clustering, manage inter-cluster and intra-cluster communication. The lifetime of clusters and number of CHs determines the efficiency of network. In this paper a Clustering algorithm based on Ant Colony Optimization (ACO) for VANETs (CACONET) is proposed. CACONET forms optimized clusters for robust communication. CACONET is compared empirically with state-of-the-art baseline techniques like Multi-Objective Particle Swarm Optimization (MOPSO) and Comprehensive Learning Particle Swarm Optimization (CLPSO). Experiments varying the grid size of the network, the transmission range of nodes, and number of nodes in the network were performed to evaluate the comparative effectiveness of these algorithms. For optimized clustering, the parameters considered are the transmission range, direction and speed of the nodes. The results indicate that CACONET significantly outperforms MOPSO and CLPSO. PMID:27149517
Directory of Open Access Journals (Sweden)
Xiaoyong Zhang
2013-01-01
Full Text Available The presence of microcalcification clusters (MCs in mammogram is a major indicator of breast cancer. Detection of an MC is one of the key issues for breast cancer control. In this paper, we present a highly accurate method based on a morphological image processing and wavelet transform technique to detect the MCs in mammograms. The microcalcifications are firstly enhanced by using multistructure elements morphological processing. Then, the candidates of microcalcifications are refined by a multilevel wavelet reconstruction approach. Finally, MCs are detected based on their distributions feature. Experiments are performed on 138 clinical mammograms. The proposed method is capable of detecting 92.9% of true microcalcification clusters with an average of 0.08 false microcalcification clusters detected per image.
A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset
K. R.S. Kumar; S. A.L. Mary
2012-01-01
Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, sensor network, which relies on dynamic data clustering algorithms. Approach: In this work, we present a ...
Gaussian Kernel Based Fuzzy C-Means Clustering Algorithm for Image Segmentation
Directory of Open Access Journals (Sweden)
Rehna Kalam
2016-04-01
Full Text Available Image processing is an important research area in c omputer vision. clustering is an unsupervised study. clustering can also be used for image segmen tation. there exist so many methods for image segmentation. image segmentation plays an importan t role in image analysis.it is one of the first and the most important tasks in image analysis and computer vision. this proposed system presents a variation of fuzzy c-means algorithm tha t provides image clustering. the kernel fuzzy c-means clustering algorithm (kfcm is derived from the fuzzy c-means clustering algorithm(fcm.the kfcm algorithm that provides ima ge clustering and improves accuracy significantly compared with classical fuzzy c-means algorithm. the new algorithm is called gaussian kernel based fuzzy c-means clustering algo rithm (gkfcmthe major characteristic of gkfcm is the use of a fuzzy clustering approach ,ai ming to guarantee noise insensitiveness and image detail preservation.. the objective of the wo rk is to cluster the low intensity in homogeneity area from the noisy images, using the clustering me thod, segmenting that portion separately using content level set approach. the purpose of designin g this system is to produce better segmentation results for images corrupted by noise, so that it c an be useful in various fields like medical image analysis, such as tumor detection, study of anatomi cal structure, and treatment planning.
Directory of Open Access Journals (Sweden)
S.Balaji
2015-06-01
Full Text Available Mobile Adhoc network is an instantaneous wireless network that is dynamic in nature. It supports single hop and multihop communication. In this infrastructure less network, clustering is a significant model to maintain the topology of the network. The clustering process includes different phases like cluster formation, cluster head selection, cluster maintenance. Choosing cluster head is important as the stability of the network depends on well-organized and resourceful cluster head. When the node has increased number of neighbors it can act as a link between the neighbor nodes which in further reduces the number of hops in multihop communication. Promisingly the node with more number of neighbors should also be available with enough energy to provide stability in the network. Hence these aspects demand the focus. In weight based cluster head selection, closeness and average minimum power required is considered for purging the ineligible nodes. The optimal set of nodes selected after purging will compete to become cluster head. The node with maximum weight selected as cluster head. Mathematical formulation is developed to show the proposed method provides optimum result. It is also suggested that weight factor in calculating the node weight should give precise importance to energy and node stability.
Clustering of Waveforms Based on FPCA Direction
Adelfio, Giada; Chiodi, Marcello; D'Alessandro, Antonino; Istituto Nazionale di Geofisica e Vulcanologia, Sezione CNT, Roma, Italia; Luzio, Dario
2010-01-01
Abstract. Looking for curves similarity could be a complex issue characterized by subjective choices related to continuous transformations of observed discrete data (Chiodi, 1989). Waveforms correlation techniques have been introduced to charac- terize the degree of seismic event similarity (Menke, 1999) and in facilitating more accurate relative locations within similar event clusters by providing more precise timing of seismic wave (P and S) arrivals (Phillips, 1997). In t...
Sarro, L. M.; Bouy, H.; Berihuete, A.; Bertin, E.; Moraux, E.; Bouvier, J.; Cuillandre, J.-C.; Barrado, D.; Solano, E.
2014-03-01
Context. With the advent of deep wide surveys, large photometric and astrometric catalogues of literally all nearby clusters and associations have been produced. The unprecedented accuracy and sensitivity of these data sets and their broad spatial, temporal and wavelength coverage make obsolete the classical membership selection methods that were based on a handful of colours and luminosities. We present a new technique designed to take full advantage of the high dimensionality (photometric, astrometric, temporal) of such a survey to derive self-consistent and robust membership probabilities of the Pleiades cluster. Aims: We aim at developing a methodology to infer membership probabilities to the Pleiades cluster from the DANCe multidimensional astro-photometric data set in a consistent way throughout the entire derivation. The determination of the membership probabilities has to be applicable to censored data and must incorporate the measurement uncertainties into the inference procedure. Methods: We use Bayes' theorem and a curvilinear forward model for the likelihood of the measurements of cluster members in the colour-magnitude space, to infer posterior membership probabilities. The distribution of the cluster members proper motions and the distribution of contaminants in the full multidimensional astro-photometric space is modelled with a mixture-of-Gaussians likelihood. Results: We analyse several representation spaces composed of the proper motions plus a subset of the available magnitudes and colour indices. We select two prominent representation spaces composed of variables selected using feature relevance determination techniques based in Random Forests, and analyse the resulting samples of high probability candidates. We consistently find lists of high probability (p > 0.9975) candidates with ≈1000 sources, 4 to 5 times more than obtained in the most recent astro-photometric studies of the cluster. Conclusions: Multidimensional data sets require
Risk Assessment for Bridges Safety Management during Operation Based on Fuzzy Clustering Algorithm
Xia Hanyu; Zhang Lijing; Tao Gang; Tong Bing; Zhang Haiou
2016-01-01
In recent years, large span and large sea-crossing bridges are built, bridges accidents caused by improper operational management occur frequently. In order to explore the better methods for risk assessment of the bridges operation departments, the method based on fuzzy clustering algorithm is selected. Then, the implementation steps of fuzzy clustering algorithm are described, the risk evaluation system is built, and Taizhou Bridge is selected as an example, the quantitation of risk factors ...
Vladimir A. Fedorov; Aleksandr V. Stepanov; Tatyana M. Stepanova
2015-01-01
The aim of the investigation is to justify urgency and efficiency of teachers’ collective work of the chair on formation of integrated special professional competences of trainees.Methods. Teamwork is considered as a leading method of aim realisation; it is suggested to carry this method in practice on the basis of a situational technique of intra-chair structure formation – composite staff-clusters, whose activity is based on synergetic interaction of educational process participants.Results...
An Empirical Evaluation of Density-Based Clustering Techniques
Glory H. Shah; C K Bhensdadia; Amit P. Ganatra
2012-01-01
Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data pertaining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one of the major data analysis methods. It is the art of detecting groups of similar objects in large data sets without having specified groups by means of explicit features. The problem of detecting clusters of points is challeng...
Richness-based masses of rich and famous galaxy clusters
Andreon, S.
2016-03-01
We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate (0.16 dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at 0.05 html
Richness-based masses of rich and famous galaxy clusters
Andreon, S.
2016-03-01
We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate (0.16 dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at 0.05 URL http://www.brera.mi.astro.it/~andreon/famous.html
MBA-LF: A NEW DATA CLUSTERING METHOD USING MODIFIED BAT ALGORITHM AND LEVY FLIGHT
Directory of Open Access Journals (Sweden)
R. Jensi
2015-10-01
Full Text Available Data clustering plays an important role in partitioning the large set of data objects into known/unknown number of groups or clusters so that the objects in each cluster are having high degree of similarity while objects in different clusters are dissimilar to each other. Recently a number of data clustering methods are explored by using traditional methods as well as nature inspired swarm intelligence algorithms. In this paper, a new data clustering method using modified bat algorithm is presented. The experimental results show that the proposed algorithm is suitable for data clustering in an efficient and robust way.
ENERGY EFFICIENT HIERARCHICAL CLUSTER-BASED ROUTING FOR WIRELESS SENSOR NETWORKS
Shideh Sadat Shirazi; Aboulfazl Torqi Haqiqat
2015-01-01
In this paper we propose an energy efficient routing algorithm based on hierarchical clustering in wireless sensor networks (WSNs).This algorithm decreases the energy consumption of nodes and helps to increase the lifetime of sensor networks. To achieve this goal, this research network is divided into 4segments that lead to uniform energy consumption among sensor nodes. We also propose a multi-step clustering method to send and receive data from nodes to the base station. The s...
Cluster-based spectrum sensing for cognitive radios with imperfect channel to cluster-head
Ben Ghorbel, Mahdi
2012-04-01
Spectrum sensing is considered as the first and main step for cognitive radio systems to achieve an efficient use of spectrum. Cooperation and clustering among cognitive radio users are two techniques that can be employed with spectrum sensing in order to improve the sensing performance by reducing miss-detection and false alarm. In this paper, within the framework of a clustering-based cooperative spectrum sensing scheme, we study the effect of errors in transmitting the local decisions from the secondary users to the cluster heads (or the fusion center), while considering non-identical channel conditions between the secondary users. Closed-form expressions for the global probabilities of detection and false alarm at the cluster head are derived. © 2012 IEEE.
Adapted G-mode Clustering Method applied to Asteroid Taxonomy
Hasselmann, Pedro H.; Carvano, Jorge M.; Lazzaro, D.
2013-11-01
The original G-mode was a clustering method developed by A. I. Gavrishin in the late 60's for geochemical classification of rocks, but was also applied to asteroid photometry, cosmic rays, lunar sample and planetary science spectroscopy data. In this work, we used an adapted version to classify the asteroid photometry from SDSS Moving Objects Catalog. The method works by identifying normal distributions in a multidimensional space of variables. The identification starts by locating a set of points with smallest mutual distance in the sample, which is a problem when data is not planar. Here we present a modified version of the G-mode algorithm, which was previously written in FORTRAN 77, in Python 2.7 and using NumPy, SciPy and Matplotlib packages. The NumPy was used for array and matrix manipulation and Matplotlib for plot control. The Scipy had a import role in speeding up G-mode, Scipy.spatial.distance.mahalanobis was chosen as distance estimator and Numpy.histogramdd was applied to find the initial seeds from which clusters are going to evolve. Scipy was also used to quickly produce dendrograms showing the distances among clusters. Finally, results for Asteroids Taxonomy and tests for different sample sizes and implementations are presented.
Fuzzy Based Anomaly Intrusion Detection System for Clustered WSN
Sumathy Murugan; Sundara Rajan, M.
2015-01-01
In Wireless Sensor Networks (WSN), the intrusion detection technique may result in increased computational cost, packet loss, performance degradation and so on. In order to overcome these issues, in this study, we propose a fuzzy based anomaly intrusion detection system for clustered WSN. Initially the cluster heads are selected based on the parameters such as link quality, residual energy and coverage. Then the anomaly intrusion is detected using fuzzy logic technique. This technique conside...
Analysis of protein profiles using fuzzy clustering methods
DEFF Research Database (Denmark)
Karemore, Gopal Raghunath; Ukendt, Sujatha; Rai, Lavanya; Kartha, V.B; C, Santhosh
The tissue protein profiles of healthy volunteers and volunteers with cervical cancer were recorded using High Performance Liquid Chromatography combined with Laser Induced Fluorescence technique (HPLC-LIF) developed in our lab. We analyzed the protein profile data using different...... clustering methods for their classification followed by various validation measures. The clustering algorithms used for the study were K- means, K- medoid, Fuzzy C-means, Gustafson-Kessel, and Gath-Geva. The results presented in this study conclude that the protein profiles of tissue...
A novel spatial clustering algorithm based on Delaunay triangulation
Yang, Xiankun; Cui, Weihong
2008-12-01
Exploratory data analysis is increasingly more necessary as larger spatial data is managed in electro-magnetic media. Spatial clustering is one of the very important spatial data mining techniques. So far, a lot of spatial clustering algorithms have been proposed. In this paper we propose a robust spatial clustering algorithm named SCABDT (Spatial Clustering Algorithm Based on Delaunay Triangulation). SCABDT demonstrates important advantages over the previous works. First, it discovers even arbitrary shape of cluster distribution. Second, in order to execute SCABDT, we do not need to know any priori nature of distribution. Third, like DBSCAN, Experiments show that SCABDT does not require so much CPU processing time. Finally it handles efficiently outliers.
Relation Based Mining Model for Enhancing Web Document Clustering
Directory of Open Access Journals (Sweden)
M.Reka
2014-05-01
Full Text Available The design of web Information management system becomes more complex one with more time complexity. Information retrieval is a difficult task due to the huge volume of web documents. The way of clustering makes the retrieval easier and less time consuming. Thisalgorithm introducesa web document clustering approach, which use the semantic relation between documents, which reduces the time complexity. It identifies the relations and concepts in a document and also computes the relation score between documents. This algorithm analyses the key concepts from the web documents by preprocessing, stemming, and stop word removal. Identified concepts are used to compute the document relation score and clusterrelation score. The domain ontology is used to compute the document relation score and cluster relation score. Based on the document relation score and cluster relation score, the web document cluster is identified. This algorithm uses 2,00,000 web documents for evaluation and 60 percentas trainingset and 40 percent as testing set.
Communication: Improved pair approximations in local coupled-cluster methods
International Nuclear Information System (INIS)
In local coupled cluster treatments the electron pairs can be classified according to the magnitude of their energy contributions or distances into strong, close, weak, and distant pairs. Different approximations are introduced for the latter three classes. In this communication, an improved simplified treatment of close and weak pairs is proposed, which is based on long-range cancellations of individually slowly decaying contributions in the amplitude equations. Benchmark calculations for correlation, reaction, and activation energies demonstrate that these approximations work extremely well, while pair approximations based on local second-order Møller-Plesset theory can lead to errors that are 1-2 orders of magnitude larger
Institute of Scientific and Technical Information of China (English)
WANG Sheng-jun; LU Zuo-mei; WAN Jian-min
2006-01-01
The genetic diversity of 41 parental lines popularized in commercial hybrid rice production in China was studied by using cluster analysis of morphological traits and simple sequence repeat (SSR) markers. Forty-one entries were assigned into two clusters (I.e. Early or medium-maturing cluster; medium or late-maturing cluster) and further assigned into six sub-clusters based on morphological trait cluster analysis. The early or medium-maturing cluster was composed of 15 maintainer lines, four early-maturing restorer lines and two thermo-sensitive genic male sterile lines, and the medium or late-maturing cluster included 16 restorer lines and 4 medium or late-maturing maintainer lines. Moreover, the SSR cluster analysis classified 41 entries into two clusters (I.e. Maintainer line cluster and restorer line cluster) and seven sub-clusters. The maintainer line cluster consisted of all 19 maintainer lines, two thermo-sensitive genic male sterile lines, while the restorer line cluster was composed of all 20 restorer lines. The SSR analysis fitted better with the pedigree information. From the views on hybrid rice breeding, the results suggested that SSR analysis might be a better method to study the diversity of parental lines in indica hybrid rice.
Institute of Scientific and Technical Information of China (English)
康春花; 任平; 曾平飞
2015-01-01
基于属性合分和聚类分析的思想提出了适用于多级评分的聚类分析方法，同时探讨了属性层次结构、样本容量和失误率对该方法判准率的影响。研究发现：(1)该方法在各种试验情境下均有较高的模式判准率和边际判准率；(2)判准率不依赖样本容量的大小，使其可适用于小型测评及课堂评估；(3)判准率受属性层次紧密度影响较小；(4)该方法在实践情境中表现出较好的内外部效度。%Examinations help students learn more efficiently by filling their learning gaps. To achieve this goal, we have to differentiate students who have from those who have not mastered a set of attributes as measured by the test through cognitive diagnostic assessment. K-means cluster analysis, being a nonparametric cognitive diagnosis method requires the Q-matrix only, which reflects the relationship between attributes and items. This does not require the estimation of the parameters, so is independent of sample size, simple to operate, and easy to understand. Previous research use the sum score vectors or capability scores vector as the clustering objects. These methods are only adaptive for dichotomous data. Structural response items are, however, the main type used in examinations, particularly as required in recent reforms. On the basis of previous research, this paper puts forward a method to calculate a capability matrix reflecting the mastery level on skills and is applicable to grade response items. Our study included four parts. First, we introduced the K-means cluster diagnosis method which has been adapted for dichotomous data. Second, we expanded the K-means cluster diagnosis method for grade response data (GRCDM). Third, in Part Two, we investigated the performance of the method introduced using a simulation study. Fourth, we investigated the performance of the method in an empirical study. The simulation study focused on three factors. First, the sample size was
Cluster-based DBMS Management Tool with High-Availability
Directory of Open Access Journals (Sweden)
Jae-Woo Chang
2005-02-01
Full Text Available A management tool which is needed for monitoring and managing cluster-based DBMSs has been little studied. So, we design and implement a cluster-based DBMS management tool with high-availability that monitors the status of nodes in a cluster system as well as the status of DBMS instances in a node. The tool enables users to recognize a single virtual system image and provides them with the status of all the nodes and resources in the system by using a graphic user interface (GUI. By using a load balancer, our management tool can increase the performance of a cluster-based DBMS as well as can overcome the limitation of the existing parallel DBMSs.
Developing a Cluster Based Parallel Ray Tracer
Lovell, C
2007-01-01
Parallel ray tracing is an important concept in computer graphics, as it allows for high quality ray traced images to be produced at a faster rate than is possible in single processor ray tracing. Presented in this paper is the conversion of a previously created single processor ray tracer to a parallel ray tracer that is capable of running on a cluster of computers. This paper presents the technical aspects of designing a parallel ray tracer by looking at the theory of transforming ray traci...
Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data
Folino, Gianluigi; Gori, Fabio; Jetten, Mike S. M.; Marchiori, Elena
The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.
Energy Band Based Clustering Protocol for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Prabhat Kumar
2012-07-01
Full Text Available Clustering is one of the widely used techniques to prolong the lifetime of wireless sensor networks in environments where battery replacement of individual sensor nodes is not an option after their deployment. However, clustering overheads such as cluster formation, its size, cluster head selection rotation, directly affects the lifetime of WSN. This paper introduces and analyzes a new Single Hop Energy Band Based clustering protocol (EBBCP which tries to minimize the above said overheads resulting in a prolonged life for the WSN. EBBCP works on static clusters formed on the basis of energy band in the setup phase. The protocol reduces per round overhead of cluster formation which has been proved by the simulation result in MATLAB. The paper contains an in-depth analysis of the results obtained during simulation and compares EBBCP with LEACH. Unlike LEACH, EBBCP achieves evenly distributed Cluster Head throughout the target area. This protocol also produces evenly distributed dead nodes. EEBCP beats LEACH in total data packet received and produces better network life time. EBBCP uses the concept of grid node to eliminate the need of position finding system like GPS to estimating the transmission signal strength.
A two-stage method for microcalcification cluster segmentation in mammography by deformable models
International Nuclear Information System (INIS)
Purpose: Segmentation of microcalcification (MC) clusters in x-ray mammography is a difficult task for radiologists. Accurate segmentation is prerequisite for quantitative image analysis of MC clusters and subsequent feature extraction and classification in computer-aided diagnosis schemes. Methods: In this study, a two-stage semiautomated segmentation method of MC clusters is investigated. The first stage is targeted to accurate and time efficient segmentation of the majority of the particles of a MC cluster, by means of a level set method. The second stage is targeted to shape refinement of selected individual MCs, by means of an active contour model. Both methods are applied in the framework of a rich scale-space representation, provided by the wavelet transform at integer scales. Segmentation reliability of the proposed method in terms of inter and intraobserver agreements was evaluated in a case sample of 80 MC clusters originating from the digital database for screening mammography, corresponding to 4 morphology types (punctate: 22, fine linear branching: 16, pleomorphic: 18, and amorphous: 24) of MC clusters, assessing radiologists’ segmentations quantitatively by two distance metrics (Hausdorff distance—HDISTcluster, average of minimum distance—AMINDISTcluster) and the area overlap measure (AOMcluster). The effect of the proposed segmentation method on MC cluster characterization accuracy was evaluated in a case sample of 162 pleomorphic MC clusters (72 malignant and 90 benign). Ten MC cluster features, targeted to capture morphologic properties of individual MCs in a cluster (area, major length, perimeter, compactness, and spread), were extracted and a correlation-based feature selection method yielded a feature subset to feed in a support vector machine classifier. Classification performance of the MC cluster features was estimated by means of the area under receiver operating characteristic curve (Az ± Standard Error) utilizing tenfold cross
A robust approach based on Weibull distribution for clustering gene expression data
Directory of Open Access Journals (Sweden)
Gong Binsheng
2011-05-01
Full Text Available Abstract Background Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest. Results In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method, a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM using functional annotation information given by the Gene Ontology (GO. The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets. Conclusions The results demonstrate that our WDCM produces clusters
Efficient Cluster Head Selection Methods for Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
Jong-Shin Chen
2010-08-01
Full Text Available The past few years have witnessed increased in the potential use of wireless sensor network (WSN such as disaster management, combat field reconnaissance, border protection and security surveillance. Sensors in these applications are expected to be remotely deployed in large numbers and to operate autonomously in unattended environments. Since a WSN is composed of nodes with nonreplenishable energy resource, elongating the network lifetime is the main concern. To support scalability, nodes are often grouped into disjoint clusters. Each cluster would have a leader, often referred as cluster head (CH. A CH is responsible for not only the general request but also assisting the general nodes to route the sensed data to the target nodes. The power-consumption of a CH is higher then of a general (non-CH node. Therefore, the CH selection will affect the lifetime of a WSN. However, the application scenario contexts of WSNs that determine the definitions of lifetime will impact to achieve the objective of elongating lifetime. In this study, we classify the lifetime into different types and give the corresponding CH selection method to achieve the life-time extension objective. Simulation results demonstrate our study can enlarge the life-time for different requests of the sensor networks.
Clustering in Very Large Databases Based on Distance and Density
Institute of Scientific and Technical Information of China (English)
QIAN WeiNing(钱卫宁); GONG XueQing(宫学庆); ZHOU AoYing(周傲英)
2003-01-01
Clustering in very large databases or data warehouses, with many applications in areas such as spatial computation, web information collection, pattern recognition and economic analysis, is a huge task that challenges data mining researches. Current clustering methods always have the problems: 1) scanning the whole database leads to high I/O cost and expensive maintenance (e.g., R*-tree); 2) pre-specifying the uncertain parameter k, with which clustering can only be refined by trial and test many times; 3) lacking high efficiency in treating arbitrary shape under very large data set environment. In this paper, we first present a new hybrid-clustering algorithm to solve these problems. This new algorithm, which combines both distance and density strategies,can handle any arbitrary shape clusters effectively. It makes full use of statistics information in mining to reduce the time complexity greatly while keeping good clustering quality. Furthermore,this algorithm can easily eliminate noises and identify outliers. An experimental evaluation is performed on a spatial database with this method and other popular clustering algorithms (CURE and DBSCAN). The results show that our algorithm outperforms them in terms of efficiency and cost, and even gets much more speedup as the data size scales up much larger.
Improving Energy Efficient Clustering Method for Wireless Sensor Network
Directory of Open Access Journals (Sweden)
Md. Imran Hossain
2013-08-01
Full Text Available Wireless sensor networks have recently emerged as important computing platform. These sensors are power-limited and have limited computing resources. Therefore the sensor energy has to be managed wisely in order to maximize the lifetime of the network. Simply speaking, LEACH requires the knowledge of energy for every node in the network topology used. In LEACHs threshold which selects the cluster head is fixed so this protocol does not consider network topology environments. We proposed IELP algorithm, which selects cluster heads using different thresholds. New cluster head selection probability consists of the initial energy and the number of neighbor nodes. On rotation basis, a head-set member receives data from the neighboring nodes and transmits the aggregated results to the distant base station. For a given number of data collecting sensor nodes, the number of control and management nodes can be systematically adjusted to reduce the energy consumption, which increases the network life.The simulation results show that the performance of IELP has an improvement of 39% over LEACH and 20% over SEP in the area of 100m*100m for m=0.1, α =2 where advanced nodes (m and the additional energy factor between advanced and normal nodes (α.
Exploitation of semantic methods to cluster pharmacovigilance terms.
Dupuch, Marie; Dupuch, Laëtitia; Hamon, Thierry; Grabar, Natalia
2014-01-01
Pharmacovigilance is the activity related to the collection, analysis and prevention of adverse drug reactions (ADRs) induced by drugs. This activity is usually performed within dedicated databases (national, European, international...), in which the ADRs declared for patients are usually coded with a specific controlled terminology MedDRA (Medical Dictionary for Drug Regulatory Activities). Traditionally, the detection of adverse drug reactions is performed with data mining algorithms, while more recently the groupings of close ADR terms are also being exploited. The Standardized MedDRA Queries (SMQs) have become a standard in pharmacovigilance. They are created manually by international boards of experts with the objective to group together the MedDRA terms related to a given safety topic. Within the MedDRA version 13, 84 SMQs exist, although several important safety topics are not yet covered. The objective of our work is to propose an automatic method for assisting the creation of SMQs using the clustering of semantically close MedDRA terms. The experimented method relies on semantic approaches: semantic distance and similarity algorithms, terminology structuring methods and term clustering. The obtained results indicate that the proposed unsupervised methods appear to be complementary for this task, they can generate subsets of the existing SMQs and make this process systematic and less time consuming. PMID:24739596
Directory of Open Access Journals (Sweden)
Farnaz Pakdeland
2016-08-01
Full Text Available Wireless sensor network is comprised of several sensor nodes. The retaining factors influence the network operation. In the clustering structure the cluster head failure can cause loss of information.The aim of this paper is to increase tolerance error in the cluster head node. At first, paying attention to the producing balance in the density of the cluster cause to postpone the death time of the cluster head node and lessen the collision due to the lack of the energy balance in clusters. The innovation in this stage is formed by using two fuzzy logic systems. One in the phase of evaluation of the cluster head chance, and the other in the phase of producing balance and the nodes migration to the qualified clusters to increase balance, Then the focus is on recognizing and repairing the cluster head fault.
Comparing Methods for segmentation of Microcalcification Clusters in Digitized Mammograms
Moradmand, Hajar; Targhi, Hossein Khazaei
2012-01-01
The appearance of microcalcifications in mammograms is one of the early signs of breast cancer. So, early detection of microcalcification clusters (MCCs) in mammograms can be helpful for cancer diagnosis and better treatment of breast cancer. In this paper a computer method has been proposed to support radiologists in detection MCCs in digital mammography. First, in order to facilitate and improve the detection step, mammogram images have been enhanced with wavelet transformation and morphology operation. Then for segmentation of suspicious MCCs, two methods have been investigated. The considered methods are: adaptive threshold and watershed segmentation. Finally, the detected MCCs areas in different algorithms will be compared to find out which segmentation method is more appropriate for extracting MCCs in mammograms.
Neural network method for galaxy classification: the luminosity function of E/S0 in clusters
Molinari, Emilio; Smareglia, Riccardo
1998-02-01
We present a method based on the non-linear behaviour of neural network for the identification of the early-type population in the cores of galaxy clusters. A Kohonen Self Organising Map applied on a three-colour photometric catalogue of objects enabled us to select in each passband the elliptical galaxies. We measured in this way the luminosity function of the E/S0 galaxies selected in this way. Such luminosity functions show peculiarities which disfavour the hypothesis of its universality often claimed for rich clusters and that can be related to the past dynamical history of the cluster as a whole. Based on observations made at the European Southern Observatory (ESO), La Silla, Chile
A Method of Clustering Persons' Profiles for Counseling
Keat, Donald B., II; Hackman, Roy B.
1972-01-01
Individuals were grouped into person clusters on the basis of the similarity of their inventory profiles. In any particular profile cluster, homogeneous groups (by curriculum areas) of individuals tend to group into attraction patterns (presence in profile cluster) and avoidance patterns (absence from profile cluster). (Author)
Cluster Development of Zhengzhou Urban Agriculture Based on Diamond Model
Institute of Scientific and Technical Information of China (English)
2012-01-01
Based on basic theory of Diamond Model,this paper analyzes the competitive power of Zhengzhou urban agriculture from production factors,demand conditions,related and supporting industries,business strategies and structure,and horizontal competition.In line with these situations,it introduces that the cluster development is an effective approach to lifting competitive power of Zhengzhou urban agriculture.Finally,it presents following countermeasures and suggestions:optimize spatial distribution for cluster development of urban agriculture;cultivate leading enterprises and optimize organizational form of urban agriculture;energetically develop low-carbon agriculture to create favorable ecological environment for cluster development of urban agriculture.
WORMHOLE ATTACK MITIGATION IN MANET: A CLUSTER BASED AVOIDANCE TECHNIQUE
Directory of Open Access Journals (Sweden)
Subhashis Banerjee
2014-01-01
Full Text Available A Mobile Ad-Hoc Network (MANET is a self configuring, infrastructure less network of mobile devices connected by wireless links. Loopholes like wireless medium, lack of a fixed infrastructure, dynamic topology, rapid deployment practices, and the hostile environments in which they may be deployed, make MANET vulnerable to a wide range of security attacks and Wormhole attack is one of them. During this attack a malicious node captures packets from one location in the network, and tunnels them to another colluding malicious node at a distant point, which replays them locally. This paper presents a cluster based Wormhole attack avoidance technique. The concept of hierarchical clustering with a novel hierarchical 32- bit node addressing scheme is used for avoiding the attacking path during the route discovery phase of the DSR protocol, which is considered as the under lying routing protocol. Pinpointing the location of the wormhole nodes in the case of exposed attack is also given by using this method.
AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA
Directory of Open Access Journals (Sweden)
A. Alizade Naeini
2014-10-01
Full Text Available K-means is definitely the most frequently used partitional clustering algorithm in the remote sensing community. Unfortunately due to its gradient decent nature, this algorithm is highly sensitive to the initial placement of cluster centers. This problem deteriorates for the high-dimensional data such as hyperspectral remotely sensed imagery. To tackle this problem, in this paper, the spectral signatures of the endmembers in the image scene are extracted and used as the initial positions of the cluster centers. For this purpose, in the first step, A Neyman–Pearson detection theory based eigen-thresholding method (i.e., the HFC method has been employed to estimate the number of endmembers in the image. Afterwards, the spectral signatures of the endmembers are obtained using the Minimum Volume Enclosing Simplex (MVES algorithm. Eventually, these spectral signatures are used to initialize the k-means clustering algorithm. The proposed method is implemented on a hyperspectral dataset acquired by ROSIS sensor with 103 spectral bands over the Pavia University campus, Italy. For comparative evaluation, two other commonly used initialization methods (i.e., Bradley & Fayyad (BF and Random methods are implemented and compared. The confusion matrix, overall accuracy and Kappa coefficient are employed to assess the methods’ performance. The evaluations demonstrate that the proposed solution outperforms the other initialization methods and can be applied for unsupervised classification of hyperspectral imagery for landcover mapping.
Volatility clustering in agent based market models
Giardina, Irene; Bouchaud, Jean-Philippe
2003-06-01
We define and study a market model, where agents have different strategies among which they can choose, according to their relative profitability, with the possibility of not participating to the market. The price is updated according to the excess demand, and the wealth of the agents is properly accounted for. Only two parameters play a significant role: one describes the impact of trading on the price, and the other describes the propensity of agents to be trend following or contrarian. We observe three different regimes, depending on the value of these two parameters: an oscillating phase with bubbles and crashes, an intermittent phase and a stable ‘rational’ market phase. The statistics of price changes in the intermittent phase resembles that of real price changes, with small linear correlations, fat tails and long-range volatility clustering. We discuss how the time dependence of these two parameters spontaneously drives the system in the intermittent region.
Taxonomically Clustering Organisms Based on the Profiles of Gene Sequences Using PCA
Directory of Open Access Journals (Sweden)
E. Ramaraj
2006-01-01
Full Text Available The biological implications of bioinformatics can already be seen in various implementations. Biological taxonomy may seem like a simple science in which the biologists merely observe similarities among organisms and construct classifications according to those similarities[1], but it is not so simple. By applying data mining techniques on gene sequence database we can cluster the data to find interesting similarities in the gene expression data. One of the applications of such kind of clustering is taxonomically clustering the organisms based on their gene sequential expressions. In this study we outlined a method for taxonomical clustering of species of the organisms based on the genetic profile using Principal Component Analysis and Self Organizing Neural Networks. We have implemented the idea using Matlab and tried to cluster the gene sequences taken from PAUP version of the ML5/ML6 database. The taxa used for some of the basidiomycetous fungi form the database. To study the scalability issues another large gene sequence database was used. The proposed method clustered the species of organisms correctly in almost all the cases. The obtained were more significant and promising. The proposed method clustered the species of organisms correctly in almost all the cases. The obtained results were more significant and promising.
A Cluster-Based Fuzzy Fusion Algorithm for Event Detection in Heterogeneous Wireless Sensor Networks
Directory of Open Access Journals (Sweden)
ZiQi Hao
2015-01-01
Full Text Available As limited energy is one of the tough challenges in wireless sensor networks (WSN, energy saving becomes important in increasing the lifecycle of the network. Data fusion enables combining information from several sources thus to provide a unified scenario, which can significantly save sensor energy and enhance sensing data accuracy. In this paper, we propose a cluster-based data fusion algorithm for event detection. We use k-means algorithm to form the nodes into clusters, which can significantly reduce the energy consumption of intracluster communication. Distances between cluster heads and event and energy of clusters are fuzzified, thus to use a fuzzy logic to select the clusters that will participate in data uploading and fusion. Fuzzy logic method is also used by cluster heads for local decision, and then the local decision results are sent to the base station. Decision-level fusion for final decision of event is performed by base station according to the uploaded local decisions and fusion support degree of clusters calculated by fuzzy logic method. The effectiveness of this algorithm is demonstrated by simulation results.
Distance based (DBCP) Cluster Protocol for Heterogeneous Wireless Sensor Network
Kumar, Surender; Prateek, Manish; Bhushan, Bharat
2014-01-01
Clustering is an important concept to reduce the energy consumption and prolonging the life of a wireless sensor network. In heterogeneous wireless sensor network some of the nodes are equipped with more energy than the other nodes. Many routing algorithms are proposed for heterogeneous wireless sensor network. Stable Election Protocol (SEP) is one of the important protocol in this category. In this research paper a novel energy efficient distance based cluster protocol (DBCP) is proposed for...
Energy efficient cluster-based routing in wireless sensor networks
Zeghilet, Houda; Badache, Nadjib; Maimour, Moufida
2009-01-01
Because of the lack of a global naming scheme, routing protocols in sensor networks usually use flooding to select paths and deliver data. This process although simple and effective, is very costly in terms of energy consumption, an important design issue in sensor networks routing protocols. Cluster-based routing is one solution to save energy. In this paper, we propose a combination of an improved clustering algorithm and directed diffusion, a well-known data-centric routing paradigm in sen...
Rank Based Clustering For Document Retrieval From Biomedical Databases
Directory of Open Access Journals (Sweden)
Jayanthi Manicassamy
2009-09-01
Full Text Available Now a day's, search engines are been most widely used for extracting information's from various resources throughout the world. Where, majority of searches lies in the field of biomedical for retrieving related documents from various biomedical databases. Currently search engines lacks in document clustering and representing relativeness level of documents extracted from the databases. In order to overcome these pitfalls a text based search engine have been developed for retrieving documents from Medline and PubMed biomedical databases. The search engine has incorporated page ranking bases clustering concept which automatically represents relativeness on clustering bases. Apart from this graph tree construction is made for representing the level of relatedness of the documents that are networked together. This advance functionality incorporation for biomedical document based search engine found to provide better results in reviewing related documents based on relativeness.
Institute of Scientific and Technical Information of China (English)
2015-01-01
This paper focuses on the synthetic aperture radar (SAR)imaging of space-sparse targets such as ships on the sea,and proposes a method of targets separation and imaging of sparse scene based on cluster result of range profile peaks.Firstly,wavelet de-noising algorithm is used to preprocess the original echo,and then the range profile at different viewing positions can be obtained by range compression and range migration correction.Peaks of the range profi les can be detected by the fast peak detection algorithm based on second order difference operator.Targets with sparse energy intervals can be imaged through azimuth compression after clustering of peaks in range dimension.What’s more,targets without coupling in range energy interval and direction synthetic aperture time can be imaged through azimuth compression after clustering of peaks both in range and direction dimension.Lastly,the effectiveness of the proposed method is validated by simulations.Results of experiment demonstrate that space-sparse targets such as ships can be imaged separately and completely with a small computation in azimuth compression, and the images are more beneficial for target recognition.%针对海面舰船等具有一定空间稀疏性的合成孔径雷达成像场景，提出了一种稀疏场景目标的距离像峰值聚类分割成像方法。首先采用小波降噪算法对 SAR 原始回波数据进行预处理，通过距离压缩和距离徙动校正获得不同观测位置的距离像，利用基于二阶差分算子的快速峰值检测算法检测距离像峰值点，对峰值检测结果距离维聚类后方位向成像，实现了距离向能量区间稀疏目标的分割成像；对峰值检测结果距离-方位二维聚类后方位向成像，实现了距离向能量区间与方位向合成孔径时间无耦合稀疏目标的分割成像。仿真结果表明，对海面舰船等具有空间稀疏性的成像场景，所提方法能够实现目标的有效分割成
Advances in Generalized Valence Bond-Coupled Cluster Methods for Electronic Structure Theory
Lawler, Keith Vanoy
2009-01-01
The electron-electron correlation term in the electronic energy of a molecule is the most difficult term to compute, yet it is of both qualitative and quantitative importance for a diverse range of chemical applications of computational quantum chemistry. Generalized Valence Bond-Coupled Cluster (GVB-CC) methods are computationally efficient, size-consistent wavefunction based methods to capture the most important static (valence) contributions to the correlation energy. Despite these advanta...
Li, Liangxing; Yang, Jianrui; Huang, Hongzhang; Xu, Liyuan; Gao, Chongkai; Li, Ning
2016-07-01
We evaluated 26 microemulsion liquid chromatography (MELC) systems for their potential as high-throughput screening platforms capable of modeling the partitioning behaviors of drug compounds in an n-octanol-water system, and for predicting the lipophilicity of those compounds (i.e. logP values). The MELC systems were compared by cluster analysis and a linear solvation energy relationship (LSER)-based method, and the optimal system was identified by comparing their Euclidean distances with the LSER coefficients. The most effective MELC system had a mobile phase consisting of 6.0% (w/w) Brij35 (a detergent), 6.6% (w/w) butanol, 0.8% (w/w) cyclohexane, 86.6% (w/w) buffer solution and 8 mm cetyltrimethyl ammonium bromide. The reliability of the established platform was confirmed by the agreement between the experimental data and the predicted values. The logP values of the ingredients of danshen root (Salvia miltiorrhiza Radix et Rhizoma) were then predicted. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26490541
Authentication Based on Multilayer Clustering in Ad Hoc Networks
Directory of Open Access Journals (Sweden)
Suh Heyi-Sook
2005-01-01
Full Text Available In this paper, we describe a secure cluster-routing protocol based on a multilayer scheme in ad hoc networks. This work provides scalable, threshold authentication scheme in ad hoc networks. We present detailed security threats against ad hoc routing protocols, specifically examining cluster-based routing. Our proposed protocol, called "authentication based on multilayer clustering for ad hoc networks" (AMCAN, designs an end-to-end authentication protocol that relies on mutual trust between nodes in other clusters. The AMCAN strategy takes advantage of a multilayer architecture that is designed for an authentication protocol in a cluster head (CH using a new concept of control cluster head (CCH scheme. We propose an authentication protocol that uses certificates containing an asymmetric key and a multilayer architecture so that the CCH is achieved using the threshold scheme, thereby reducing the computational overhead and successfully defeating all identified attacks. We also use a more extensive area, such as a CCH, using an identification protocol to build a highly secure, highly available authentication service, which forms the core of our security framework.
Combined Density-based and Constraint-based Algorithm for Clustering
Institute of Scientific and Technical Information of China (English)
CHEN Tung-shou; CHEN Rong-chang; LIN Chih-chiang; CHIU Yung-hsing
2006-01-01
We propose a new clustering algorithm that assists the researchers to quickly and accurately analyze data. We call this algorithm Combined Density-based and Constraint-based Algorithm (CDC). CDC consists of two phases. In the first phase, CDC employs the idea of density-based clustering algorithm to split the original data into a number of fragmented clusters. At the same time, CDC cuts off the noises and outliers. In the second phase, CDC employs the concept of K-means clustering algorithm to select a greater cluster to be the center. Then, the greater cluster merges some smaller clusters which satisfy some constraint rules.Due to the merged clusters around the center cluster, the clustering results show high accu racy. Moreover, CDC reduces the calculations and speeds up the clustering process. In this paper, the accuracy of CDC is evaluated and compared with those of K-means, hierarchical clustering, and the genetic clustering algorithm (GCA)proposed in 2004. Experimental results show that CDC has better performance.
Two clusters of child molesters based on impulsiveness
Directory of Open Access Journals (Sweden)
Danilo A. Baltieri
2015-06-01
Full Text Available Objective:High impulsiveness is a general problem that affects most criminal offenders and is associated with greater recidivism risk. A cluster analysis of impulsiveness measured by the Barratt Impulsiveness Scale - Version 11 (BIS-11 was performed on a sample of hands-on child molesters.Methods:The sample consisted of 208 child molesters enrolled in two different sectional studies carried out in São Paulo, Brazil. Using three factors from the BIS-11, a k-means cluster analysis was performed using the average silhouette width to determine cluster number. Direct logistic regression was performed to analyze the association of criminological and clinical features with the resulting clusters.Results:Two clusters were delineated. The cluster characterized by higher impulsiveness showed higher scores on the Sexual Screening for Pedophilic Interests (SSPI, Static-99, and Sexual Addiction Screening Test.Conclusions:Given that child molesters are an extremely heterogeneous population, the “number of victims” item of the SSPI should call attention to those offenders with the highest motor, attentional, and non-planning impulsiveness. Our findings could have implications in terms of differences in therapeutic management for these two groups, with the most impulsive cluster benefitting from psychosocial strategies combined with pharmacological interventions.
An Adaptive Spectral Clustering Algorithm Based on the Importance of Shared Nearest Neighbors
Directory of Open Access Journals (Sweden)
Xiaoqi He
2015-05-01
Full Text Available The construction of a similarity matrix is one significant step for the spectral clustering algorithm; while the Gaussian kernel function is one of the most common measures for constructing the similarity matrix. However, with a fixed scaling parameter, the similarity between two data points is not adaptive and appropriate for multi-scale datasets. In this paper, through quantitating the value of the importance for each vertex of the similarity graph, the Gaussian kernel function is scaled, and an adaptive Gaussian kernel similarity measure is proposed. Then, an adaptive spectral clustering algorithm is gotten based on the importance of shared nearest neighbors. The idea is that the greater the importance of the shared neighbors between two vertexes, the more possible it is that these two vertexes belong to the same cluster; and the importance value of the shared neighbors is obtained with an iterative method, which considers both the local structural information and the distance similarity information, so as to improve the algorithm’s performance. Experimental results on different datasets show that our spectral clustering algorithm outperforms the other spectral clustering algorithms, such as the self-tuning spectral clustering and the adaptive spectral clustering based on shared nearest neighbors in clustering accuracy on most datasets.
Multi-hop routing-based optimization of the number of cluster-heads in wireless sensor networks.
Nam, Choon Sung; Han, Young Shin; Shin, Dong Ryeol
2011-01-01
Wireless sensor networks require energy-efficient data transmission because the sensor nodes have limited power. A cluster-based routing method is more energy-efficient than a flat routing method as it can only send specific data for user requirements and aggregate similar data by dividing a network into a local cluster. However, previous clustering algorithms have some problems in that the transmission radius of sensor nodes is not realistic and multi-hop based communication is not used both inside and outside local clusters. As energy consumption based on clustering is dependent on the number of clusters, we need to know how many clusters are best. Thus, we propose an optimal number of cluster-heads based on multi-hop routing in wireless sensor networks. We observe that a local cluster made by a cluster-head influences the energy consumption of sensor nodes. We determined an equation for the number of packets to send and relay, and calculated the energy consumption of sensor networks using it. Through the process of calculating the energy consumption, we can obtain the optimal number of cluster-heads in wireless sensor networks. PMID:22163771
International Nuclear Information System (INIS)
We present a method that uses conventional optical microscopes to determine the number of nanoparticles in a cluster, which is typically not possible using traditional image-based optical methods due to the diffraction limit. The method, called through-focus scanning optical microscopy (TSOM), uses a series of optical images taken at varying focus levels to achieve this. The optical images cannot directly resolve the individual nanoparticles, but contain information related to the number of particles. The TSOM method makes use of this information to determine the number of nanoparticles in a cluster. Initial good agreement between the simulations and the measurements is also presented. The TSOM method can be applied to fluorescent and non-fluorescent as well as metallic and non-metallic nano-scale materials, including soft materials, making it attractive for tag-less, high-speed, optical analysis of nanoparticles down to 45 nm diameter
DSN Beowulf Cluster-Based VLBI Correlator
Rogstad, Stephen P.; Jongeling, Andre P.; Finley, Susan G.; White, Leslie A.; Lanyi, Gabor E.; Clark, John E.; Goodhart, Charles E.
2009-01-01
The NASA Deep Space Network (DSN) requires a broadband VLBI (very long baseline interferometry) correlator to process data routinely taken as part of the VLBI source Catalogue Maintenance and Enhancement task (CAT M&E) and the Time and Earth Motion Precision Observations task (TEMPO). The data provided by these measurements are a crucial ingredient in the formation of precision deep-space navigation models. In addition, a VLBI correlator is needed to provide support for other VLBI related activities for both internal and external customers. The JPL VLBI Correlator (JVC) was designed, developed, and delivered to the DSN as a successor to the legacy Block II Correlator. The JVC is a full-capability VLBI correlator that uses software processes running on multiple computers to cross-correlate two-antenna broadband noise data. Components of this new system (see Figure 1) consist of Linux PCs integrated into a Beowulf Cluster, an existing Mark5 data storage system, a RAID array, an existing software correlator package (SoftC) originally developed for Delta DOR Navigation processing, and various custom- developed software processes and scripts. Parallel processing on the JVC is achieved by assigning slave nodes of the Beowulf cluster to process separate scans in parallel until all scans have been processed. Due to the single stream sequential playback of the Mark5 data, some ramp-up time is required before all nodes can have access to required scan data. Core functions of each processing step are accomplished using optimized C programs. The coordination and execution of these programs across the cluster is accomplished using Pearl scripts, PostgreSQL commands, and a handful of miscellaneous system utilities. Mark5 data modules are loaded on Mark5 Data systems playback units, one per station. Data processing is started when the operator scans the Mark5 systems and runs a script that reads various configuration files and then creates an experiment-dependent status database
ANONYMIZATION BASED ON NESTED CLUSTERING FOR PRIVACY PRESERVATION IN DATA MINING
Directory of Open Access Journals (Sweden)
V.Rajalakshmi
2013-07-01
Full Text Available Privacy Preservation in data mining protects the data from revealing unauthorized extraction of information. Data Anonymization techniques implement this by modifying the data, so that the original values cannot be acquired easily. Perturbation techniques are variedly used which will greatly affect the quality of data,since there is a trade-off between privacy preservation and information loss which will subsequently affect the result of data mining. The method that is proposed in this paper is based on nested clustering of data andperturbation on each cluster. The size of clusters is kept optimal to reduce the information loss. The paper explains the methodology, implementation and results of nested clustering. Various metrics are also provided to explicate that this method overcomes the disadvantages of other perturbation methods.
The distance of the Fornax Cluster based on Globular Cluster Luminosity Functions
Kohle, Sven; Kissler-Patig, Markus; Hilker, Michael; Richtler, Tom; Infante, L.; Quintana, H.
1996-01-01
We present Globular Cluster Luminosity Functions for four ellipticals and one S0-Galaxy in the Fornax cluster of galaxies, derived from CCD photometry in V and I. The averaged turnover magnitudes are $V_{TO} = 23.80 \\pm 0.06$ and $I_{TO} =22.39 \\pm 0.05$, respectively. We derive a relative distance modulus $(m-M)_{Fornax} - (m-M)_{M87} = 0.08 \\pm 0.09$ mag using the turnover of M87 based on HST data.
Improved Clustered Routing Algorithm Based on Distance and Energy in Wireless Sensor Networks
Wang, Dejun; Meng, Bo; Shaomin JIN
2013-01-01
Since the energy supply of a sensor node is limited, energy optimization should be considered as the key objective when studying the wireless sensor networks (WSN). Facing these challenges, clustering is one of the methods used to manage network energy consumption efficiently, and plays an important role in prolonging network lifetime and reducing energy consumption. The improved clustered routing algorithm based on distance and energy is proposed, which efficient improve the rate of data agg...
Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells
Energy Technology Data Exchange (ETDEWEB)
Felfer, P., E-mail: peter.felfer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ceguerra, A.V., E-mail: anna.ceguerra@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Ringer, S.P., E-mail: simon.ringer@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia); Cairney, J.M., E-mail: julie.cairney@sydney.edu.au [Australian Centre for Microscopy and Microanalysis, The University of Sydney, NSW 2006 (Australia); School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006 (Australia)
2015-03-15
The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms.
Detecting and extracting clusters in atom probe data: A simple, automated method using Voronoi cells
International Nuclear Information System (INIS)
The analysis of the formation of clusters in solid solutions is one of the most common uses of atom probe tomography. Here, we present a method where we use the Voronoi tessellation of the solute atoms and its geometric dual, the Delaunay triangulation to test for spatial/chemical randomness of the solid solution as well as extracting the clusters themselves. We show how the parameters necessary for cluster extraction can be determined automatically, i.e. without user interaction, making it an ideal tool for the screening of datasets and the pre-filtering of structures for other spatial analysis techniques. Since the Voronoi volumes are closely related to atomic concentrations, the parameters resulting from this analysis can also be used for other concentration based methods such as iso-surfaces. - Highlights: • Cluster analysis of atom probe data can be significantly simplified by using the Voronoi cell volumes of the atomic distribution. • Concentration fields are defined on a single atomic basis using Voronoi cells. • All parameters for the analysis are determined by optimizing the separation probability of bulk atoms vs clustered atoms
Institute of Scientific and Technical Information of China (English)
杨金廷; 高敬
2013-01-01
SWOT分析方法在战略研究中应用较广，根据研究主体自身的既定内在条件进行分析，产业集群作为区域经济的载体，通过形成强劲、持续竞争力的能力，具有明显的发展优势。基于SWOT客观分析了平乡县自行车集群产业面临的外部机遇和威胁、内部优势和劣势，论述了SO、 WO、 ST、 WT不同组合下对应的产业发展策略，指出SO组合下应发展市场开发战略、品牌化战略、产品开发战略； WO组合下挖掘知识产权战略，以创新促发展，提升产品附加值，适应消费群体对高品质的需求； ST策略下应运用行业协会协同策略，通过行业协同形式共享资源，将企业联合起来共同打造平乡自行车成功模式； WT策略下提升产业集聚升级战略，改变平乡县目前小作坊、涣散管理的现状，优化产业资源配置。总之，充分发挥集群效应，为产业集群发展提供借鉴意义。%SWOT analysis method is used generally in strategic research. It analyses the established internal condition accor-ding to the research subject itself. As the carrier of the regional economic, by forming strong, sustained competitive ability, industrial clusters have great advantages of development. We analyze Opportunities, Threatens, Strengths and Weaknesses of Pingxiang bicycle cluster industry based on SWOT analysis method. We discussed four development strategies about SO, WO, ST, WT and pointed out that we should develop market development strategy, brand strategy, product development strategy under SO combination. Intellectual property strategy under WO combination , to enhance value-added products and adapt to the demand for high quality of consumers with the development of global science and technology progress and to develop innovation. The industry associations collaborative strategy under ST combination to build a successful model for Pingxiang bicycle cluster industry. To form a
A knowledge-based clustering algorithm driven by Gene Ontology.
Cheng, Jill; Cline, Melissa; Martin, John; Finkelstein, David; Awad, Tarif; Kulp, David; Siani-Rose, Michael A
2004-08-01
We have developed an algorithm for inferring the degree of similarity between genes by using the graph-based structure of Gene Ontology (GO). We applied this knowledge-based similarity metric to a clique-finding algorithm for detecting sets of related genes with biological classifications. We also combined it with an expression-based distance metric to produce a co-cluster analysis, which accentuates genes with both similar expression profiles and similar biological characteristics and identifies gene clusters that are more stable and biologically meaningful. These algorithms are demonstrated in the analysis of MPRO cell differentiation time series experiments. PMID:15468759
Timing-Driven Nonuniform Depopulation-Based Clustering
Directory of Open Access Journals (Sweden)
Hanyu Liu
2010-01-01
hence improve routability by spreading the logic over the architecture. However, all depopulation-based clustering algorithms to this date increase critical path delay. In this paper, we present a timing-driven nonuniform depopulation-based clustering technique, T-NDPack, that targets critical path delay and channel width constraints simultaneously. T-NDPack adjusts the CLB capacity based on the criticality of the Basic Logic Element (BLE. Results show that T-NDPack reduces minimum channel width by 11.07% while increasing the number of CLBs by 13.28% compared to T-VPack. More importantly, T-NDPack decreases critical path delay by 2.89%.
A Novel Clustering Algorithm Based on Quantum Games
Li, Qiang; Jiang, Jing-ping
2008-01-01
The enormous successes have been made by quantum algorithms during the last decade. In this paper, we combine the quantum game with the problem of data clustering, and develop clustering algorithms based on it, in which data points in a dataset are considered as players who can make decisions and play quantum strategies in quantum games. After playing quantum games, each player's expected payoff is calculated and then he uses an link-removing-and-rewiring (LRR) function to change his neighbors and adjust the strength of links connecting to them for maximizing his payoff. Further, algorithms are discussed and analyzed in two cases of strategies, two payoff matrixes and two LRR functions. Consequently, the experimental results have demonstrated that data points in datasets are clustered reasonably and efficiently, and the clustering algorithms have fast rates of convergence. Moreover, the comparison with other algorithms also provides an indication of the effectiveness of the proposed approach.
INSTRUMENT CLUSTER CONFIGURATION USING GUI BASED ON VB.NET
Directory of Open Access Journals (Sweden)
ISABELLA RANI K, RAJALAKSHMI P, RANJANI JULIIET A, RAJESH KUMAR G, YUVARAJ K
2013-06-01
Full Text Available The purpose of the project is to design and develop a GUI (graphical user interface based Driver information system. Instrument clusters are the traditional readout meters available in the dashboard of cars and motorbikes. In regard of the conventional meters that are in existence, the analog meters have disadvantages in lack of storage capability and accuracy whereas digital meters have display problems in dim light areas. In order to overcome these disadvantages the instrument clusters can be controlled and made user friendly with the help of GUI. Graphical User Interface allows the drivers to interact with instrument clusters using images rather than text commands and command line arguments. The goal is to enhance the efficiency and ease of use for the underlying logical design. These actions are usually performed through direct manipulation of the graphical elements. This can be implemented in instrument clusters with the help of .NET software along with the assistance of MS access as back end tool.
A Data-origin Authentication Protocol Based on ONOS Cluster
Directory of Open Access Journals (Sweden)
Qin Hua
2016-01-01
Full Text Available This paper is aim to propose a data-origin authentication protocol based on ONOS cluster. ONOS is a SDN controller which can work under a distributed environment. However, the security of an ONOS cluster is seldom considered, and the communication in an ONOS cluster may suffer from lots of security threats. In this paper, we used a two-tier self-renewable hash chain for identity authentication and data-origin authentication. We analyse the security and overhead of our proposal and made a comparison with current security measure. It showed that with the help of our proposal, communication in an ONOS cluster could be protected from identity forging, replay attacks, data tampering, MITM attacks and repudiation, also the computational overhead would decrease apparently.
Richness-based masses of rich and famous galaxy clusters
Andreon, S
2016-01-01
We present a catalog of galaxy cluster masses derived by exploiting the tight correlation between mass and richness, i.e., a properly computed number of bright cluster galaxies. The richness definition adopted in this work is properly calibrated, shows a small scatter with mass, and has a known evolution, which means that we can estimate accurate ($0.16$ dex) masses more precisely than by adopting any other richness estimates or X-ray or SZ-based proxies based on survey data. We measured a few hundred galaxy clusters at $0.05
Cluster Based Hybrid Niche Mimetic and Genetic Algorithm for Text Document Categorization
Directory of Open Access Journals (Sweden)
A. K. Santra
2011-09-01
Full Text Available An efficient cluster based hybrid niche mimetic and genetic algorithm for text document categorization to improve the retrieval rate of relevant document fetching is addressed. The proposal minimizes the processing of structuring the document with better feature selection using hybrid algorithm. In addition restructuring of feature words to associated documents gets reduced, in turn increases document clustering rate. The performance of the proposed work is measured in terms of cluster objects accuracy, term weight, term frequency and inverse document frequency. Experimental results demonstrate that it achieves very good performance on both feature selection and text document categorization, compared to other classifier methods.
Clustered iterative stochastic ensemble method for multi-modal calibration of subsurface flow models
Elsheikh, Ahmed H.
2013-05-01
A novel multi-modal parameter estimation algorithm is introduced. Parameter estimation is an ill-posed inverse problem that might admit many different solutions. This is attributed to the limited amount of measured data used to constrain the inverse problem. The proposed multi-modal model calibration algorithm uses an iterative stochastic ensemble method (ISEM) for parameter estimation. ISEM employs an ensemble of directional derivatives within a Gauss-Newton iteration for nonlinear parameter estimation. ISEM is augmented with a clustering step based on k-means algorithm to form sub-ensembles. These sub-ensembles are used to explore different parts of the search space. Clusters are updated at regular intervals of the algorithm to allow merging of close clusters approaching the same local minima. Numerical testing demonstrates the potential of the proposed algorithm in dealing with multi-modal nonlinear parameter estimation for subsurface flow models. © 2013 Elsevier B.V.
SOFT CLUSTERING BASED EXPOSITION TO MULTIPLE DICTIONARY BAG OF WORDS
Sujatha, K. S.; B. Vinod
2012-01-01
Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more. A new common method of classification that uses features is the Bag of Words approach. In this method a codebook of visual words is created using various clustering methods. For increasing the performance Multiple Dictionaries BoW (MDBoW) method that uses more visual words from dif...
An Algorithm of Speaker Clustering Based on Model Distance
Directory of Open Access Journals (Sweden)
Wei Li
2014-03-01
Full Text Available An algorithm based on Model Distance (MD for spectral speaker clustering is proposed to deal with the shortcoming of general spectral clustering algorithm in describing the distribution of signal source. First, an Universal Background Model (UBM is created with a large quantity of independent speakers; Then, Gaussian Mixture Model (GMM is trained from the UBM for every speech segment; At last, the probability distance between the GMM of every speech segment is used to build affinity matrix, and speaker spectral clustering is done on the affinity matrix. Experimental results based on news and conference data sets show that an average of 6.38% improvements in F measure is obtained in comparison with algorithm based on the feature vector distance. In addition, the proposed algorithm is 11.72 times faster
International Nuclear Information System (INIS)
Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [18F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JImean= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JIrange: 0.2301–1). Conclusion: Using commonly-used clustering algorithms, we found poor
Energy Technology Data Exchange (ETDEWEB)
Harmon, S; Wendelberger, B [University of Wisconsin-Madison, Madison, WI (United States); Jeraj, R [University of Wisconsin-Madison, Madison, WI (United States); University of Ljubljana (Slovenia)
2014-06-01
Purpose: Radiogenomics aims to establish relationships between patient genotypes and imaging phenotypes. An open question remains on how best to integrate information from these distinct datasets. This work investigates if similarities in genetic features across patients correspond to similarities in PET-imaging features, assessed with various clustering algorithms. Methods: [{sup 18}F]FDG PET data was obtained for 26 NSCLC patients from a public database (TCIA). Tumors were contoured using an in-house segmentation algorithm combining gradient and region-growing techniques; resulting ROIs were used to extract 54 PET-based features. Corresponding genetic microarray data containing 48,778 elements were also obtained for each tumor. Given mismatch in feature sizes, two dimension reduction techniques were also applied to the genetic data: principle component analysis (PCA) and selective filtering of 25 NSCLC-associated genes-ofinterest (GOI). Gene datasets (full, PCA, and GOI) and PET feature datasets were independently clustered using K-means and hierarchical clustering using variable number of clusters (K). Jaccard Index (JI) was used to score similarity of cluster assignments across different datasets. Results: Patient clusters from imaging data showed poor similarity to clusters from gene datasets, regardless of clustering algorithms or number of clusters (JI{sub mean}= 0.3429±0.1623). Notably, we found clustering algorithms had different sensitivities to data reduction techniques. Using hierarchical clustering, the PCA dataset showed perfect cluster agreement to the full-gene set (JI =1) for all values of K, and the agreement between the GOI set and the full-gene set decreased as number of clusters increased (JI=0.9231 and 0.5769 for K=2 and 5, respectively). K-means clustering assignments were highly sensitive to data reduction and showed poor stability for different values of K (JI{sub range}: 0.2301–1). Conclusion: Using commonly-used clustering algorithms
Study of Clustering Algorithm based on Fuzzy C-Means and Immunological Partheno Genetic
Hongfen Jiang; Junfeng Gu; Yijun Liu; Feiyue Ye; Haixu Xi; Mingfang Zhu
2013-01-01
Clustering algorithm is very important for data mining. Fuzzy c-means clustering algorithm is one of the earliest goal-function clustering algorithms, which has achieved much attention. This paper analyzes the lack of fuzzy C-means (FCM) algorithm and genetic clustering algorithm. Propose a hybrid clustering algorithm based on immune single genetic and fuzzy C-means. This algorithm uses the fuzzy clustering of Immune Partheno-Genetic to guide the number and the choice of the clustering center...
Variable Selection in Model-based Clustering: A General Variable Role Modeling
Maugis, Cathy; Celeux, Gilles; Martin-Magniette, Marie-Laure
2008-01-01
The currently available variable selection procedures in model-based clustering assume that the irrelevant clustering variables are all independent or are all linked with the relevant clustering variables. We propose a more versatile variable selection model which describes three possible roles for each variable: The relevant clustering variables, the irrelevant clustering variables dependent on a part of the relevant clustering variables and the irrelevant clustering variables totally indepe...
A rough set based rational clustering framework for determining correlated genes.
Jeyaswamidoss, Jeba Emilyn; Thangaraj, Kesavan; Ramar, Kadarkarai; Chitra, Muthusamy
2016-06-01
Cluster analysis plays a foremost role in identifying groups of genes that show similar behavior under a set of experimental conditions. Several clustering algorithms have been proposed for identifying gene behaviors and to understand their significance. The principal aim of this work is to develop an intelligent rough clustering technique, which will efficiently remove the irrelevant dimensions in a high-dimensional space and obtain appropriate meaningful clusters. This paper proposes a novel biclustering technique that is based on rough set theory. The proposed algorithm uses correlation coefficient as a similarity measure to simultaneously cluster both the rows and columns of a gene expression data matrix and mean squared residue to generate the initial biclusters. Furthermore, the biclusters are refined to form the lower and upper boundaries by determining the membership of the genes in the clusters using mean squared residue. The algorithm is illustrated with yeast gene expression data and the experiment proves the effectiveness of the method. The main advantage is that it overcomes the problem of selection of initial clusters and also the restriction of one object belonging to only one cluster by allowing overlapping of biclusters. PMID:27352972
Close Clustering Based Automated Color Image Annotation
Garg, Ankit; Dwivedi, Rahul; Asawa, Krishna
2010-01-01
Most image-search approaches today are based on the text based tags associated with the images which are mostly human generated and are subject to various kinds of errors. The results of a query to the image database thus can often be misleading and may not satisfy the requirements of the user. In this work we propose our approach to automate this tagging process of images, where image results generated can be fine filtered based on a probabilistic tagging mechanism. We implement a tool which...
Clustering based on eigenspace transformation - CBEST for efficient classification
Chen, Yanlei; Gong, Peng
2013-09-01
Large remote sensing datasets, that either cover large areas or have high spatial resolution, are often a burden of information mining for scientific studies. Here, we present an approach that conducts clustering after gray-level vector reduction. In this manner, the speed of clustering can be considerably improved. The approach features applying eigenspace transformation to the dataset followed by compressing the data in the eigenspace and storing them in coded matrices and vectors. The clustering process takes the advantage of the reduced size of the compressed data and thus reduces computational complexity. We name this approach Clustering Based on Eigen-space Transformation (CBEST). In our experiment with a subscene of Landsat Thematic Mapper (TM) imagery, CBEST was found to be able to improve speed considerably over conventional K-means as the volume of data to be clustered increases. We assessed information loss and several other factors. In addition, we evaluated the effectiveness of CBEST in mapping land cover/use with the same image that was acquired over Guangzhou City, South China and an AVIRIS hyperspectral image over Cappocanoe County, Indiana. Using reference data we assessed the accuracies for both CBEST and conventional K-means and we found that the CBEST was not negatively affected by information loss during compression in practice. We discussed potential applications of the fast clustering algorithm in dealing with large datasets in remote sensing studies.
Clustering-based analysis of semantic concept models for video shots
Koskela, Markus; Smeaton, Alan F.
2006-01-01
In this paper we present a clustering-based method for representing semantic concepts on multimodal low-level feature spaces and study the evaluation of the goodness of such models with entropy-based methods. As different semantic concepts in video are most accurately represented with different features and modalities, we utilize the relative model-wise confidence values of the feature extraction techniques in weighting them automatically. The method also provides a natural way of measuring t...
Kafieh, Rahele; Mehridehnavi, Alireza
2013-01-01
In this study, we considered some competitive learning methods including hard competitive learning and soft competitive learning with/without fixed network dimensionality for reliability analysis in microarrays. In order to have a more extensive view, and keeping in mind that competitive learning methods aim at error minimization or entropy maximization (different kinds of function optimization), we decided to investigate the abilities of mixture decomposition schemes. Therefore, we assert that this study covers the algorithms based on function optimization with particular insistence on different competitive learning methods. The destination is finding the most powerful method according to a pre-specified criterion determined with numerical methods and matrix similarity measures. Furthermore, we should provide an indication showing the intrinsic ability of the dataset to form clusters before we apply a clustering algorithm. Therefore, we proposed Hopkins statistic as a method for finding the intrinsic ability of a data to be clustered. The results show the remarkable ability of Rayleigh mixture model in comparison with other methods in reliability analysis task. PMID:24083134
Close Clustering Based Automated Color Image Annotation
Garg, Ankit; Asawa, Krishna
2010-01-01
Most image-search approaches today are based on the text based tags associated with the images which are mostly human generated and are subject to various kinds of errors. The results of a query to the image database thus can often be misleading and may not satisfy the requirements of the user. In this work we propose our approach to automate this tagging process of images, where image results generated can be fine filtered based on a probabilistic tagging mechanism. We implement a tool which helps to automate the tagging process by maintaining a training database, wherein the system is trained to identify certain set of input images, the results generated from which are used to create a probabilistic tagging mechanism. Given a certain set of segments in an image it calculates the probability of presence of particular keywords. This probability table is further used to generate the candidate tags for input images.
H. Wu; Chu, TP; Tong, SY; Ng, CY
1998-01-01
We have compared multiple-scattering results of angle-resolved photoelectron diffraction spectra between the exact slab method and the separable propagator perturbation cluster method. In the slab method, the source wave and multiple scattering within strongly scattering layers are expanded in spherical waves while the scattering among different layers is expressed in plane waves. The transformation between spherical waves and plane waves is done exactly. The plane waves are then matched acro...
Efficient Clustering for Irregular Geometries Based on Identification of Concavities
Directory of Open Access Journals (Sweden)
Velázquez-Villegas Fernando
2014-04-01
Full Text Available Two dimensional clustering problem has much relevance in applications related to the efficient use of raw material, such as cutting stock, packing, etc. This is a very complex problem in which multiple bodies are accommodated efficiently in a way that they occupy as little space as possible. The complexity of the problem increases with the complexity of the bodies. Clearly the number of possible arrangements between bodies is huge. No Fit Polygon (NFP allows to determine the entire relative positions between two patterns (regular or irregular in contact, non-overlapping, therefore the best position can be selected. However, NFP generation requires a lot of calculations; besides, selecting the best cluster isn’t a simple task because, between two irregular patterns in contact, hollows (unusable areas and external concavities (usable areas can be produced. This work presents a quick and simple method to reduce calculations associated with NFP generation and to minimize unusable areas in a cluster. This method consists of generating partial NFP, just on concave regions of the patterns, and selecting the best cluster using a total weighted efficiency, i.e. a weighted value of enclosure efficiency (ratio of occupied area on convex hull area and hollow efficiency (ratio of occupied area on cluster area. The proposed method produces similar results as those obtained by other methods; however the shape of the clusters obtained allows to accommodate more parts in similar spaces, which is a desirable result when it comes to optimizing the use of material. We present two examples to show the performance of the proposal.
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data
Directory of Open Access Journals (Sweden)
K. Mumtaz
2010-06-01
Full Text Available Mining knowledge from large amounts of spatial data is known as spatial data mining. It becomes a highly demanding field because huge amounts of spatial data have been collected in various applications ranging from geo-spatial data to bio-medical knowledge. The amount of spatial data being collected is increasing exponentially. So, it far exceeded human’s ability to analyze. Recently, clustering has been recognized as a primary data mining method for knowledge discovery in spatial database. The development of clustering algorithms has received a lot of attention in the last few years and new clustering algorithms are proposed. DBSCAN is a pioneer density based clustering algorithm. It can find out the clusters of different shapes and sizes from the large amount of data containing noise and outliers. This paper shows the results of analyzing the properties of density based clustering characteristics of three clustering algorithms namely DBSCAN, k-means and SOM using synthetic two dimensional spatial data sets.
A Comparative Analysis of Density Based Clustering Techniques for Outlier Mining
Directory of Open Access Journals (Sweden)
R.Prabahari*,
2014-11-01
Full Text Available Density based Clustering Algorithms such as Density Based Spatial Clustering of Applications with Noise (DBSCAN, Ordering Points to Identify the Clustering Structure (OPTICS and DENsity based CLUstering (DENCLUE are designed to discover clusters of arbitrary shape. DBSCAN grows clusters according to a density based connectivity analysis. OPTICS, which is an extension of DBSCAN used to produce clusters ordering obtained by setting range of parameter. DENCLUE clusters object is based on a set of density distribution functions. The comparison of the algorithms in terms of essential parameters such as complexity, clusters shape, input parameters, noise handle, cluster quality and run time are considered. The analysis is useful in finding which density based clustering algorithm is suitable in different criteria.
Institute of Scientific and Technical Information of China (English)
贾冀婷
2015-01-01
To improve the automation ability of testcase generation in software testing is very important to guarantee the quality of soft-ware and reduce the cost of software. In this paper,propose an automatic testcase generation method based on particle swarm optimiza-tion,artificial bee colony algorithm and K-means clustering algorithm,and carry out the simulation experiments. The results show that the improved algorithm’ s efficiency is better and convergence ability is stronger than other algorithms such as particle swarm optimization and genetic algorithm in the automation ability of testcase generation.%软件测试中测试用例自动生成技术对于确保软件质量与降低开发成本都是非常重要的。文中基于K均值聚类算法与粒子群算法和人工蜂群算法相结合的混合算法，提出了一种测试用例自动生成方法，并且对此方法进行了仿真实验。实验结果表明，与基本的粒子群算法、遗传算法的测试用例自动生成方法相比较，基于文中改进算法的测试用例自动生成方法具有测试用例自动生成效率高、收敛能力强等优点。
Hu, Xihao; Shi, Christina Huan; Yip, Kevin Y.
2016-01-01
Motivation: The three-dimensional structure of genomes makes it possible for genomic regions not adjacent in the primary sequence to be spatially proximal. These DNA contacts have been found to be related to various molecular activities. Previous methods for analyzing DNA contact maps obtained from Hi-C experiments have largely focused on studying individual interactions, forming spatial clusters composed of contiguous blocks of genomic locations, or classifying these clusters into general categories based on some global properties of the contact maps. Results: Here, we describe a novel computational method that can flexibly identify small clusters of spatially proximal genomic regions based on their local contact patterns. Using simulated data that highly resemble Hi-C data obtained from real genome structures, we demonstrate that our method identifies spatial clusters that are more compact than methods previously used for clustering genomic regions based on DNA contact maps. The clusters identified by our method enable us to confirm functionally related genomic regions previously reported to be spatially proximal in different species. We further show that each genomic region can be assigned a numeric affinity value that indicates its degree of participation in each local cluster, and these affinity values correlate quantitatively with DNase I hypersensitivity, gene expression, super enhancer activities and replication timing in a cell type specific manner. We also show that these cluster affinity values can precisely define boundaries of reported topologically associating domains, and further define local sub-domains within each domain. Availability and implementation: The source code of BNMF and tutorials on how to use the software to extract local clusters from contact maps are available at http://yiplab.cse.cuhk.edu.hk/bnmf/. Contact: kevinyip@cse.cuhk.edu.hk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307607
Scalable Parallel Density-based Clustering and Applications
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
A Resampling Based Clustering Algorithm for Replicated Gene Expression Data.
Li, Han; Li, Chun; Hu, Jie; Fan, Xiaodan
2015-01-01
In gene expression data analysis, clustering is a fruitful exploratory technique to reveal the underlying molecular mechanism by identifying groups of co-expressed genes. To reduce the noise, usually multiple experimental replicates are performed. An integrative analysis of the full replicate data, instead of reducing the data to the mean profile, carries the promise of yielding more precise and robust clusters. In this paper, we propose a novel resampling based clustering algorithm for genes with replicated expression measurements. Assuming those replicates are exchangeable, we formulate the problem in the bootstrap framework, and aim to infer the consensus clustering based on the bootstrap samples of replicates. In our approach, we adopt the mixed effect model to accommodate the heterogeneous variances and implement a quasi-MCMC algorithm to conduct statistical inference. Experiments demonstrate that by taking advantage of the full replicate data, our algorithm produces more reliable clusters and has robust performance in diverse scenarios, especially when the data is subject to multiple sources of variance. PMID:26671802
Hybrid Weighted-based Clustering Routing Protocol for Railway Communications
Directory of Open Access Journals (Sweden)
Jianli Xie
2013-12-01
Full Text Available In the paper, a hybrid clustering routing strategy is proposed for railway emergency ad hoc network, when GSM-R base stations are destroyed or some terminals (or nodes are far from the signal coverage. In this case, the cluster-head (CH election procedure is invoked on-demand, which takes into consideration the degree difference from the ideal degree, relative clustering stability, the sum of distance between the node and it’s one-hop neighbors, consumed power, node type and node mobility. For the clustering forming, the weights for the CH election parameters are allocated rationally by rough set theory. The hybrid weighted-based clustering routing (HWBCR strategy is designed for railway emergency communication scene, which aims to get a good trade-off between the computation costs and performances. The simulation platform is constructed to evaluate the performance of our strategy in terms of the average end-to-end delay, packet loss ratio, routing overhead and average throughput. The results, by comparing with the railway communication QoS index, reveal that our strategy is suitable for transmitting dispatching voice and data between train and ground, when the train speed is less than 220km/h
D Partition-Based Clustering for Supply Chain Data Management
Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.
2015-10-01
Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.
Risk Probability Estimating Based on Clustering
DEFF Research Database (Denmark)
Chen, Yong; Jensen, Christian D.; Gray, Elizabeth;
2003-01-01
biquitous computing environments are highly dynamic, with new unforeseen circumstances and constantly changing environments, which introduces new risks that cannot be assessed through traditional means of risk analysis. Mobile entities in a ubiquitous computing environment require the ability to...... from the insurance industry do not directly apply to ubiquitous computing environments. Instead, we propose a dynamic mechanism for risk assessment, which is based on pattern matching, classification and prediction procedures. This mechanism uses an estimator of risk probability, which is based on the...... perform an autonomous assessment of the risk incurred by a specific interaction with another entity in a given context. This assessment will allow a mobile entity to decide whether sufficient evidence exists to mitigate the risk and allow the interaction to proceed. Such evidence might include records of...
Mining Representative Subset Based on Fuzzy Clustering
Institute of Scientific and Technical Information of China (English)
ZHOU Hongfang; FENG Boqin; L(U) Lintao
2007-01-01
Two new concepts-fuzzy mutuality and average fuzzy entropy are presented. Then based on these concepts, a new algorithm-RSMA (representative subset mining algorithm) is proposed, which can abstract representative subset from massive data.To accelerate the speed of producing representative subset, an improved algorithm-ARSMA(accelerated representative subset mining algorithm) is advanced, which adopt combining putting forward with backward strategies. In this way, the performance of the algorithm is improved. Finally we make experiments on real datasets and evaluate the representative subset. The experiment shows that ARSMA algorithm is more excellent than RandomPick algorithm either on effectiveness or efficiency.
The Heterogeneous P-Median Problem for Categorization Based Clustering
Blanchard, Simon J.; Aloise, Daniel; DeSarbo, Wayne S.
2012-01-01
The p-median offers an alternative to centroid-based clustering algorithms for identifying unobserved categories. However, existing p-median formulations typically require data aggregation into a single proximity matrix, resulting in masked respondent heterogeneity. A proposed three-way formulation of the p-median problem explicitly considers…
Frailty phenotypes in the elderly based on cluster analysis
DEFF Research Database (Denmark)
Dato, Serena; Montesanto, Alberto; Lagani, Vincenzo;
2012-01-01
genetic background on the frailty status is still questioned. We investigated the applicability of a cluster analysis approach based on specific geriatric parameters, previously set up and validated in a southern Italian population, to two large longitudinal Danish samples. In both cohorts, we identified...
Cluster based parallel database management system for data intensive computing
Institute of Scientific and Technical Information of China (English)
Jianzhong LI; Wei ZHANG
2009-01-01
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently sup-port data intensive computing in response to the rapid grow-ing in database size and the need of high performance ana-lyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinator-wrapper mechanism to support the integration of heteroge-neous information resources on the Internet, and the fault tol-erant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.
a Three-Step Spatial-Temporal Clustering Method for Human Activity Pattern Analysis
Huang, W.; Li, S.; Xu, S.
2016-06-01
How people move in cities and what they do in various locations at different times form human activity patterns. Human activity pattern plays a key role in in urban planning, traffic forecasting, public health and safety, emergency response, friend recommendation, and so on. Therefore, scholars from different fields, such as social science, geography, transportation, physics and computer science, have made great efforts in modelling and analysing human activity patterns or human mobility patterns. One of the essential tasks in such studies is to find the locations or places where individuals stay to perform some kind of activities before further activity pattern analysis. In the era of Big Data, the emerging of social media along with wearable devices enables human activity data to be collected more easily and efficiently. Furthermore, the dimension of the accessible human activity data has been extended from two to three (space or space-time) to four dimensions (space, time and semantics). More specifically, not only a location and time that people stay and spend are collected, but also what people "say" for in a location at a time can be obtained. The characteristics of these datasets shed new light on the analysis of human mobility, where some of new methodologies should be accordingly developed to handle them. Traditional methods such as neural networks, statistics and clustering have been applied to study human activity patterns using geosocial media data. Among them, clustering methods have been widely used to analyse spatiotemporal patterns. However, to our best knowledge, few of clustering algorithms are specifically developed for handling the datasets that contain spatial, temporal and semantic aspects all together. In this work, we propose a three-step human activity clustering method based on space, time and semantics to fill this gap. One-year Twitter data, posted in Toronto, Canada, is used to test the clustering-based method. The results show that the
Directory of Open Access Journals (Sweden)
William E Stutz
Full Text Available Genes of the vertebrate major histocompatibility complex (MHC are of great interest to biologists because of their important role in immunity and disease, and their extremely high levels of genetic diversity. Next generation sequencing (NGS technologies are quickly becoming the method of choice for high-throughput genotyping of multi-locus templates like MHC in non-model organisms. Previous approaches to genotyping MHC genes using NGS technologies suffer from two problems:1 a "gray zone" where low frequency alleles and high frequency artifacts can be difficult to disentangle and 2 a similar sequence problem, where very similar alleles can be difficult to distinguish as two distinct alleles. Here were present a new method for genotyping MHC loci--Stepwise Threshold Clustering (STC--that addresses these problems by taking full advantage of the increase in sequence data provided by NGS technologies. Unlike previous approaches for genotyping MHC with NGS data that attempt to classify individual sequences as alleles or artifacts, STC uses a quasi-Dirichlet clustering algorithm to cluster similar sequences at increasing levels of sequence similarity. By applying frequency and similarity based criteria to clusters rather than individual sequences, STC is able to successfully identify clusters of sequences that correspond to individual or similar alleles present in the genomes of individual samples. Furthermore, STC does not require duplicate runs of all samples, increasing the number of samples that can be genotyped in a given project. We show how the STC method works using a single sample library. We then apply STC to 295 threespine stickleback (Gasterosteus aculeatus samples from four populations and show that neighboring populations differ significantly in MHC allele pools. We show that STC is a reliable, accurate, efficient, and flexible method for genotyping MHC that will be of use to biologists interested in a variety of downstream applications.
Environment-based selection effects of Planck clusters
Kosyra, R.; Gruen, D.; Seitz, S.; Mana, A.; Rozo, E.; Rykoff, E.; Sanchez, A.; Bender, R.
2015-09-01
We investigate whether the large-scale structure environment of galaxy clusters imprints a selection bias on Sunyaev-Zel'dovich (SZ) catalogues. Such a selection effect might be caused by line of sight (LoS) structures that add to the SZ signal or contain point sources that disturb the signal extraction in the SZ survey. We use the Planck PSZ1 union catalogue in the Sloan Digital Sky Survey (SDSS) region as our sample of SZ-selected clusters. We calculate the angular two-point correlation function (2pcf) for physically correlated, foreground and background structure in the RedMaPPer SDSS DR8 catalogue with respect to each cluster. We compare our results with an optically selected comparison cluster sample and with theoretical predictions. In contrast to the hypothesis of no environment-based selection, we find a mean 2pcf for background structures of -0.049 on scales of ≲40 arcmin, significantly non-zero at ˜4σ, which means that Planck clusters are more likely to be detected in regions of low background density. We hypothesize this effect arises either from background estimation in the SZ survey or from radio sources in the background. We estimate the defect in SZ signal caused by this effect to be negligibly small, of the order of ˜10-4 of the signal of a typical Planck detection. Analogously, there are no implications on X-ray mass measurements. However, the environmental dependence has important consequences for weak lensing follow up of Planck galaxy clusters: we predict that projection effects account for half of the mass contained within a 15 arcmin radius of Planck galaxy clusters. We did not detect a background underdensity of CMASS LRGs, which also leaves a spatially varying redshift dependence of the Planck SZ selection function as a possible cause for our findings.
Directory of Open Access Journals (Sweden)
P. S. Hiremath
2014-11-01
Full Text Available In mobile ad-hoc networks (MANET, the movement of the nodes may quickly change the networks topology resulting in the increase of the overhead message in topology maintenance. The nodes communicate with each other by exchanging the hello packet and constructing the neighbor list at each node. MANET is vulnerable to attacks such as black hole attack, gray hole attack, worm hole attack and sybil attack. A black hole attack makes a serious impact on routing, packet delivery ratio, throughput, and end to end delay of packets. In this paper, the performance comparison of clustering based and threshold based algorithms for detection and prevention of cooperative in MANETs is examined. In this study every node is monitored by its own cluster head (CH, while server (SV monitors the entire network by channel overhearing method. Server computes the trust value based on sent and receive count of packets of the receiver node. It is implemented using AODV routing protocol in the NS2 simulations. The results are obtained by comparing the performance of clustering based and threshold based methods by varying the concentration of black hole nodes and are analyzed in terms of throughput, packet delivery ratio. The results demonstrate that the threshold based method outperforms the clustering based method in terms of throughput, packet delivery ratio and end to end delay.
Clustering Based Feature Learning on Variable Stars
Mackenzie, Cristóbal; Protopapas, Pavlos
2016-01-01
The success of automatic classification of variable stars strongly depends on the lightcurve representation. Usually, lightcurves are represented as a vector of many statistical descriptors designed by astronomers called features. These descriptors commonly demand significant computational power to calculate, require substantial research effort to develop and do not guarantee good performance on the final classification task. Today, lightcurve representation is not entirely automatic; algorithms that extract lightcurve features are designed by humans and must be manually tuned up for every survey. The vast amounts of data that will be generated in future surveys like LSST mean astronomers must develop analysis pipelines that are both scalable and automated. Recently, substantial efforts have been made in the machine learning community to develop methods that prescind from expert-designed and manually tuned features for features that are automatically learned from data. In this work we present what is, to our ...
Comparative Studies of Various Clustering Techniques and Its Characteristics
Directory of Open Access Journals (Sweden)
M.Sathya Deepa
2014-05-01
Full Text Available Discovering knowledge from the mass database is the main objective of the Data Mining. Clustering is the key technique in data mining. A cluster is made up of a number of similar objects grouped together. The clustering is an unsupervised learning. There are many methods to form clusters. The four important methods of clustering are Partitional Clustering, Hierarchical Clustering, Density-Based Clustering and Grid-Based Clustering. In this paper, we discussed these four methods in detail.
Density Based Clustering Algorithm using Sparse Memory Mapped File
J. Hencil Peter; A. Antonysamy
2010-01-01
The DBSCAN [1] algorithm is a popular algorithm in Data Mining field as it has the ability to mine the noiseless arbitrary shape Clusters in an elegant way. As the original DBSCAN algorithm uses the distance measures to compute the distance between objects, it consumes so much processing time and it’s computation complexity comes as O(N^2). In this paper we have proposed a new algorithm for mining the density based clusters using Sparse Memory Mapped File (Spares MMF) [3]. All the given objec...
Climatology of Mexico: a Description Based on Clustering Analysis
Pineda-Martinez, L. F.; Carbajal, N.
2007-05-01
Climate regions of Mexico are delimitated using hierarchical clustering analysis (HCA). We assign the variables, precipitation and temperature, to groups or clusters based on similar statistical characteristics. Since meteorological stations in Mexico expose a heterogonous geographic distribution, we used principal components analysis (PCA) to obtain a standardized reduced matrix to apply conveniently HCA. We consider monthly means of maxima and minima temperature and monthly accumulated precipitation from a meteorological dataset of the National Water Commission of Mexico. It allows defining groups of station delimiting regions of similar climate. It also allows describing the regional effect of events such as the Mexican monsoon and ENSO.
Personalized Concept-Based Clustering Of Search Engine Queries
Directory of Open Access Journals (Sweden)
Rohit Chouhan
2016-06-01
Full Text Available Now a day web search currently facing many problems like search queries are very short and ambiguous and not meet exact user want. To remove such type problem few search engines suggest terms that are meaningfully related to the submitted queries so that users may be select from the suggestions the ones that reflect their information needs. In this paper, we introduce an hybrid approach that takes the user’s conceptual preferences in directive to provide personalized query recommendations. We achieve this goal with two new strategies. First, we develop online techniques that extract concepts from the websnippets of the search result returned from a query and use the concepts to identify related queries for that user query. Second, we propose a new two phase personalized agglomerative clustering algorithm that is able to generate personalized query cluster show Proposed approach will be better precision and recall than the existing query clustering methods.
SEARCH PROFILES BASED ON USER TO CLUSTER SIMILARITY
Directory of Open Access Journals (Sweden)
Ilija Subasic
2007-12-01
Full Text Available Privacy of web users' query search logs has, since last year's AOL dataset release, been treated as one of the central issues concerning privacy on the Internet, Therefore, the question of privacy preservation has also raised a lot of attention in different communities surrounding the search engines. Usage of clustering methods for providing low level contextual search, wriile retaining high privacy/utility is examined in this paper. By using only the user's cluster membership the search query terms could be no longer retained thus providing less privacy concerns both for the users and companies. The paper brings lightweight framework for combining query words, user similarities and clustering in order to provide a meaningful way of mining user searches while protecting their privacy. This differs from previous attempts for privacy preserving in the attempt to anonymize the queries instead of the users.
Rishi, Varun; Perera, Ajith; Bartlett, Rodney J.
2016-03-01
Obtaining the correct potential energy curves for the dissociation of multiple bonds is a challenging problem for ab initio methods which are affected by the choice of a spin-restricted reference function. Coupled cluster (CC) methods such as CCSD (coupled cluster singles and doubles model) and CCSD(T) (CCSD + perturbative triples) correctly predict the geometry and properties at equilibrium but the process of bond dissociation, particularly when more than one bond is simultaneously broken, is much more complicated. New modifications of CC theory suggest that the deleterious role of the reference function can be diminished, provided a particular subset of terms is retained in the CC equations. The Distinguishable Cluster (DC) approach of Kats and Manby [J. Chem. Phys. 139, 021102 (2013)], seemingly overcomes the deficiencies for some bond-dissociation problems and might be of use in quasi-degenerate situations in general. DC along with other approximate coupled cluster methods such as ACCD (approximate coupled cluster doubles), ACP-D45, ACP-D14, 2CC, and pCCSD(α, β) (all defined in text) falls under a category of methods that are basically obtained by the deletion of some quadratic terms in the double excitation amplitude equation for CCD/CCSD (coupled cluster doubles model/coupled cluster singles and doubles model). Here these approximate methods, particularly those based on the DC approach, are studied in detail for the nitrogen molecule bond-breaking. The N2 problem is further addressed with conventional single reference methods but based on spatial symmetry-broken restricted Hartree-Fock (HF) solutions to assess the use of these references for correlated calculations in the situation where CC methods using fully symmetry adapted SCF solutions fail. The distinguishable cluster method is generalized: 1) to different orbitals for different spins (unrestricted HF based DCD and DCSD), 2) by adding triples correction perturbatively (DCSD(T)) and iteratively (DCSDT
Problem decomposition by mutual information and force-based clustering
Otero, Richard Edward
The scale of engineering problems has sharply increased over the last twenty years. Larger coupled systems, increasing complexity, and limited resources create a need for methods that automatically decompose problems into manageable sub-problems by discovering and leveraging problem structure. The ability to learn the coupling (inter-dependence) structure and reorganize the original problem could lead to large reductions in the time to analyze complex problems. Such decomposition methods could also provide engineering insight on the fundamental physics driving problem solution. This work forwards the current state of the art in engineering decomposition through the application of techniques originally developed within computer science and information theory. The work describes the current state of automatic problem decomposition in engineering and utilizes several promising ideas to advance the state of the practice. Mutual information is a novel metric for data dependence and works on both continuous and discrete data. Mutual information can measure both the linear and non-linear dependence between variables without the limitations of linear dependence measured through covariance. Mutual information is also able to handle data that does not have derivative information, unlike other metrics that require it. The value of mutual information to engineering design work is demonstrated on a planetary entry problem. This study utilizes a novel tool developed in this work for planetary entry system synthesis. A graphical method, force-based clustering, is used to discover related sub-graph structure as a function of problem structure and links ranked by their mutual information. This method does not require the stochastic use of neural networks and could be used with any link ranking method currently utilized in the field. Application of this method is demonstrated on a large, coupled low-thrust trajectory problem. Mutual information also serves as the basis for an
A fast and accurate method for computing the Sunyaev-Zeldovich signal of hot galaxy clusters
Chluba, Jens; Sazonov, Sergey; Nelson, Kaylea
2012-01-01
New generation ground and space-based CMB experiments have ushered in discoveries of massive galaxy clusters via the Sunyaev-Zeldovich (SZ) effect, providing a new window for studying cluster astrophysics and cosmology. Many of the newly discovered, SZ-selected clusters contain hot intracluster plasma (kTe > 10 keV) and exhibit disturbed morphology, indicative of frequent mergers with large peculiar velocity (v > 1000 km s^{-1}). It is well-known that for the interpretation of the SZ signal from hot, moving galaxy clusters, relativistic corrections must be taken into account, and in this work, we present a fast and accurate method for computing these effects. Our approach is based on an alternative derivation of the Boltzmann collision term which provides new physical insight into the sources of different kinematic corrections in the scattering problem. By explicitly imposing Lorentz-invariance of the scattering optical depth, we also show that the kinematic corrections to the SZ intensity signal found in thi...
VANET Clustering Based Routing Protocol Suitable for Deserts
Mohammed Nasr, Mohammed Mohsen; Abdelgader, Abdeldime Mohamed Salih; Wang, Zhi-Gong; Shen, Lian-Feng
2016-01-01
In recent years, there has emerged applications of vehicular ad hoc networks (VANETs) towards security, safety, rescue, exploration, military and communication redundancy systems in non-populated areas, besides its ordinary use in urban environments as an essential part of intelligent transportation systems (ITS). This paper proposes a novel algorithm for the process of organizing a cluster structure and cluster head election (CHE) suitable for VANETs. Moreover, it presents a robust clustering-based routing protocol, which is appropriate for deserts and can achieve high communication efficiency, ensuring reliable information delivery and optimal exploitation of the equipment on each vehicle. A comprehensive simulation is conducted to evaluate the performance of the proposed CHE and routing algorithms. PMID:27058539
Personality based clusters as predictors of aviator attitudes and performance
Gregorich, Steve; Helmreich, Robert L.; Wilhelm, John A.; Chidester, Thomas
1989-01-01
The feasibility of identification of personality-based population clusters was investigated along with the relationships of these subpopulations to relevant attitude and performance measures. The results of instrumental and expressive personality tests, using the Personal Characteristics Inventory (PCI) test battery and the Cockpit Management Attitudes Questionnaire, suggest that theoretically meaningful subpopulations exist among aviators, and that these groupings are useful in understanding of personality factors acting as moderator variables in the determination of aviator attitudes and performance. Out of the three clusters most easily described in terms of their relative elevations on the PCI subscales ('the right stuff', the 'wrong stuff', and the 'no stuff'), the members of the right stuff cluster tended to have more desirable patterns of responses along relevant attitudinal dimensions.
GENERALISED MODEL BASED CONFIDENCE INTERVALS IN TWO STAGE CLUSTER SAMPLING
Directory of Open Access Journals (Sweden)
Christopher Ouma Onyango
2010-09-01
Full Text Available Chambers and Dorfman (2002 constructed bootstrap confidence intervals in model based estimation for finite population totals assuming that auxiliary values are available throughout a target population and that the auxiliary values are independent. They also assumed that the cluster sizes are known throughout the target population. We now extend to two stage sampling in which the cluster sizes are known only for the sampled clusters, and we therefore predict the unobserved part of the population total. Jan and Elinor (2008 have done similar work, but unlike them, we use a general model, in which the auxiliary values are not necessarily independent. We demonstrate that the asymptotic properties of our proposed estimator and its coverage rates are better than those constructed under the model assisted local polynomial regression model.
VANET Clustering Based Routing Protocol Suitable for Deserts.
Mohammed Nasr, Mohammed Mohsen; Abdelgader, Abdeldime Mohamed Salih; Wang, Zhi-Gong; Shen, Lian-Feng
2016-01-01
In recent years, there has emerged applications of vehicular ad hoc networks (VANETs) towards security, safety, rescue, exploration, military and communication redundancy systems in non-populated areas, besides its ordinary use in urban environments as an essential part of intelligent transportation systems (ITS). This paper proposes a novel algorithm for the process of organizing a cluster structure and cluster head election (CHE) suitable for VANETs. Moreover, it presents a robust clustering-based routing protocol, which is appropriate for deserts and can achieve high communication efficiency, ensuring reliable information delivery and optimal exploitation of the equipment on each vehicle. A comprehensive simulation is conducted to evaluate the performance of the proposed CHE and routing algorithms. PMID:27058539
Information bottleneck based incremental fuzzy clustering for large biomedical data.
Liu, Yongli; Wan, Xing
2016-08-01
Incremental fuzzy clustering combines advantages of fuzzy clustering and incremental clustering, and therefore is important in classifying large biomedical literature. Conventional algorithms, suffering from data sparsity and high-dimensionality, often fail to produce reasonable results and may even assign all the objects to a single cluster. In this paper, we propose two incremental algorithms based on information bottleneck, Single-Pass fuzzy c-means (spFCM-IB) and Online fuzzy c-means (oFCM-IB). These two algorithms modify conventional algorithms by considering different weights for each centroid and object and scoring mutual information loss to measure the distance between centroids and objects. spFCM-IB and oFCM-IB are used to group a collection of biomedical text abstracts from Medline database. Experimental results show that clustering performances of our approaches are better than such prominent counterparts as spFCM, spHFCM, oFCM and oHFCM, in terms of accuracy. PMID:27260783
Environment-based selection effects of Planck clusters
Kosyra, Ralf; Seitz, Stella; Mana, Annalisa; Rozo, Eduardo; Rykoff, Eli; Sanchez, Ariel; Bender, Ralf
2015-01-01
We investigate whether the large scale structure environment of galaxy clusters imprints a selection bias on Sunyaev Zel'dovich (SZ) catalogs. Such a selection effect might be caused by line of sight (LoS) structures that add to the SZ signal or contain point sources that disturb the signal extraction in the SZ survey. We use the Planck PSZ1 union catalog (Planck Collab- oration et al. 2013a) in the SDSS region as our sample of SZ selected clusters. We calculate the angular two-point correlation function (2pcf) for physically correlated, foreground and background structure in the RedMaPPer SDSS DR8 catalog with respect to each cluster. We compare our results with an optically selected comparison cluster sample and with theoretical predictions. In contrast to the hypothesis of no environment-based selection, we find a mean 2pcf for background structures of -0.049 on scales of $\\lesssim 40'$, significantly non-zero at $\\sim 4 \\sigma$, which means that Planck clusters are more likely to be detected in regions of...
Clustering Protein Sequences Using Affinity Propagation Based on an Improved Similarity Measure
Directory of Open Access Journals (Sweden)
Fan Yang
2010-01-01
Full Text Available The sizes of the protein databases are growing rapidly nowadays, thus it becomes increasingly important to cluster protein sequences only based on sequence information. In this paper we improve the similarity measure proposed by Kelil et al, then cluster sequences using the Affinity propagation (AP algorithm and provide a method to decide the input preference of AP algorithm. We tested our method extensively and compared its performance with other four methods on several datasets of COG, G protein, CAZy, SCOP database. We consistently observed that, the number of clusters that we obtained for a given set of proteins approximate to the correct number of clusters in that set. Moreover, in our experiments, the quality of the clusters when quantified by F-measure was better than that of other algorithms (on average, it is 15% better than that of BlastClust, 56% better than that of TribeMCL, 23% better than that of CLUSS, and 42% better than that of Spectral clustering.
Directory of Open Access Journals (Sweden)
Cooper James B
2010-03-01
Full Text Available Abstract Background Clustering the information content of large high-dimensional gene expression datasets has widespread application in "omics" biology. Unfortunately, the underlying structure of these natural datasets is often fuzzy, and the computational identification of data clusters generally requires knowledge about cluster number and geometry. Results We integrated strategies from machine learning, cartography, and graph theory into a new informatics method for automatically clustering self-organizing map ensembles of high-dimensional data. Our new method, called AutoSOME, readily identifies discrete and fuzzy data clusters without prior knowledge of cluster number or structure in diverse datasets including whole genome microarray data. Visualization of AutoSOME output using network diagrams and differential heat maps reveals unexpected variation among well-characterized cancer cell lines. Co-expression analysis of data from human embryonic and induced pluripotent stem cells using AutoSOME identifies >3400 up-regulated genes associated with pluripotency, and indicates that a recently identified protein-protein interaction network characterizing pluripotency was underestimated by a factor of four. Conclusions By effectively extracting important information from high-dimensional microarray data without prior knowledge or the need for data filtration, AutoSOME can yield systems-level insights from whole genome microarray expression studies. Due to its generality, this new method should also have practical utility for a variety of data-intensive applications, including the results of deep sequencing experiments. AutoSOME is available for download at http://jimcooperlab.mcdb.ucsb.edu/autosome.
Threshold selection for classification of MR brain images by clustering method
International Nuclear Information System (INIS)
Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease
Threshold selection for classification of MR brain images by clustering method
Moldovanu, Simona; Obreja, Cristian; Moraru, Luminita
2015-12-01
Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.
Threshold selection for classification of MR brain images by clustering method
Energy Technology Data Exchange (ETDEWEB)
Moldovanu, Simona [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania); Dumitru Moţoc High School, 15 Milcov St., 800509, Galaţi (Romania); Obreja, Cristian; Moraru, Luminita, E-mail: luminita.moraru@ugal.ro [Faculty of Sciences and Environment, Department of Chemistry, Physics and Environment, Dunărea de Jos University of Galaţi, 47 Domnească St., 800008, Romania, Phone: +40 236 460 780 (Romania)
2015-12-07
Given a grey-intensity image, our method detects the optimal threshold for a suitable binarization of MR brain images. In MR brain image processing, the grey levels of pixels belonging to the object are not substantially different from the grey levels belonging to the background. Threshold optimization is an effective tool to separate objects from the background and further, in classification applications. This paper gives a detailed investigation on the selection of thresholds. Our method does not use the well-known method for binarization. Instead, we perform a simple threshold optimization which, in turn, will allow the best classification of the analyzed images into healthy and multiple sclerosis disease. The dissimilarity (or the distance between classes) has been established using the clustering method based on dendrograms. We tested our method using two classes of images: the first consists of 20 T2-weighted and 20 proton density PD-weighted scans from two healthy subjects and from two patients with multiple sclerosis. For each image and for each threshold, the number of the white pixels (or the area of white objects in binary image) has been determined. These pixel numbers represent the objects in clustering operation. The following optimum threshold values are obtained, T = 80 for PD images and T = 30 for T2w images. Each mentioned threshold separate clearly the clusters that belonging of the studied groups, healthy patient and multiple sclerosis disease.
AES based secure low energy adaptive clustering hierarchy for WSNs
Kishore, K. R.; Sarma, N. V. S. N.
2013-01-01
Wireless sensor networks (WSNs) provide a low cost solution in diversified application areas. The wireless sensor nodes are inexpensive tiny devices with limited storage, computational capability and power. They are being deployed in large scale in both military and civilian applications. Security of the data is one of the key concerns where large numbers of nodes are deployed. Here, an energy-efficient secure routing protocol, secure-LEACH (Low Energy Adaptive Clustering Hierarchy) for WSNs based on the Advanced Encryption Standard (AES) is being proposed. This crypto system is a session based one and a new session key is assigned for each new session. The network (WSN) is divided into number of groups or clusters and a cluster head (CH) is selected among the member nodes of each cluster. The measured data from the nodes is aggregated by the respective CH's and then each CH relays this data to another CH towards the gateway node in the WSN which in turn sends the same to the Base station (BS). In order to maintain confidentiality of data while being transmitted, it is necessary to encrypt the data before sending at every hop, from a node to the CH and from the CH to another CH or to the gateway node.
International Nuclear Information System (INIS)
Using the solvothermal method, we present the comparative preparation of ([Co3Na(dmaep)3(ehbd)(N3)3]·DMF)n (1) and [Co2Na2(hmbd)4(N3)2(DMF)2] (2), where Hehbd is 3-ethoxy-2-hydroxy-benzaldehyde, Hhmbd is 3-methoxy-2-hydroxy-benzaldehyde, and Hdmaep is 2-dimethylaminomethyl-6-ethoxy-phenol, which was synthesized by an in-situ reaction. Complexes 1 and 2 were characterized by elemental analysis, IR spectroscopy, and X-ray single-crystal diffraction. Complex 1 is a novel heterometallic cluster-based 1-D chain and 2 is a heterometallic tetranuclear cluster. The (Co3IINa) and (Co2IINa2) cores display dominant ferromagnetic interaction from the nature of the binding modes through μ1,1,1-N3– (end-on, EO). - Graphical abstract: Two novel cobalt complexes have been prepared. Compound 1 consists of tetranuclear (Co3IINa) units, which further formed a 1-D chain. Compound 2 is heterometallic tetranuclear cluster. Two complexes display dominant ferromagnetic interaction. - Highlights: • Two new heterometallic complexes have been synthesized by solvothermal method. • The stereospecific blockade of the ligands in the synthesis system seems to be the most important synthetic parameter. • The magnetism studies show that 1 and 2 exhibit ferromagnetic interactions. • Complex 1 shows slowing down of magnetization and not blocking of magnetization
Recognition of Spontaneous Combustion in Coal Mines Based on Genetic Clustering
Institute of Scientific and Technical Information of China (English)
无
2006-01-01
Spontaneous combustion is one of the greatest disasters in coal mines. Early recognition is important because it may be a potential inducement for other coalmine accidents. However, early recognition is difficult because of the complexity of different coal mines. Fuzzy clustering has been proposed to incorporate the uncertainty of spontaneous combustion in coal mines and it can give a clear degree of classification of combustion. Because FCM clustering tends to become trapped in local minima, a new approach of fuzzy c-means clustering based on a genetic algorithm is therefore proposed. Genetic algorithm is capable of locating optimal or near optimal solutions to difficult problems. It can be applied in many fields without first obtaining detailed knowledge about correlation. It is helpful in improving the effectiveness of fuzzy clustering in detecting spontaneous combustion. The effectiveness of the method is demonstrated by means of an experiment.
Cluster-based reduced-order modelling of a mixing layer
Kaiser, Eurika; Cordier, Laurent; Spohn, Andreas; Segond, Marc; Abel, Markus; Daviller, Guillaume; Niven, Robert K
2013-01-01
We propose a novel cluster-based reduced-order modelling (CROM) strategy of unsteady flows. CROM builds on the pioneering works of Gunzburger's group in cluster analysis (Burkardt et al. 2006) and Eckhardt's group in transition matrix models (Schneider et al. 2007) and constitutes a potential alternative to POD models. This strategy processes a time-resolved sequence of flow snapshots in two steps. First, the snapshot data is clustered into a small number of representative states, called centroids, in the state space. These centroids partition the state space in complementary non-overlapping regions (centroidal Voronoi cells). Departing from the standard algorithm, the probability of the clusters are determined, and the states are sorted by transition matrix consideration. Secondly, the transitions between the states are dynamically modelled via a Markov process. Physical mechanisms are then distilled by a refined analysis of the Markov process, e.g. with the finite-time Lyapunov exponent and entropic methods...
Approximate K-Nearest Neighbour Based Spatial Clustering Using K-D Tree
Directory of Open Access Journals (Sweden)
Mohammed Otair
2013-03-01
Full Text Available Different spatial objects that vary in their characteristics, such as molecular biology and geography, arepresented in spatial areas. Methods to organize, manage, and maintain those objects in a structuredmanner are required. Data mining raised different techniques to overcome these requirements. There aremany major tasks of data mining, but the mostly used task is clustering. Data set within the same clustershare common features that give each cluster its characteristics. In this paper, an implementation ofApproximate kNN-based spatial clustering algorithm using the K-d tree is proposed. The majorcontribution achieved by this research is the use of the k-d tree data structure for spatial clustering, andcomparing its performance to the brute-force approach. The results of the work performed in this paperrevealed better performance using the k-d tree, compared to the traditional brute-force approach.
Neural network based cluster reconstruction in the ATLAS silicon Pixel Detector
International Nuclear Information System (INIS)
The hit signals read out from pixels on planar semi-conductor sensors are grouped into clusters, to reconstruct the location where a charged particle passed through. The spatial resolution of the pixel detector can be improved significantly using the information from the cluster of adjacent pixels. Such analogue cluster creation techniques have been used by the ATLAS experiment for many years giving an excellent performance. However, in dense environments, such as those inside high-energy jets, it is likely that the charge deposited by two or more close-by tracks merges into one single cluster. A clusterization algorithm based on neural network methods has been developed for the ATLAS Pixel Detector. This can identify the shared clusters, split them if necessary, and estimate the positions of all particles traversing the cluster. The algorithm significantly reduces ambiguities in the assignment of pixel detector measurements to tracks within jets, and improves the positional accuracy with respect to standard interpolation techniques, by the use of the 2-dimensional charge distribution information. The reconstruction using the neural network reduces strongly the number of hits shared by more than one track and improves the resolution of the impact parameter by about 15%.
Implementation and experimental analysis of consensus clustering
Perc, Domen
2011-01-01
Consensus clustering is a machine learning tehnique for class discovery and clustering validation. The method uses various clustering algorithms in conjunction with different resampling tehniques for data clustering. It is based on multiple runs of clustering and sampling algorithm. Data gathered in these runs is used for clustering and for visual representation of clustering. Visual representation helps us to understand clustering results. In this thesis we compare consensus clustering with ...
Efficiency of a Multi-Reference Coupled Cluster method
Giner, Emmanuel; Scemama, Anthony; Malrieu, Jean Paul
2015-01-01
The multi-reference Coupled Cluster method first proposed by Meller et al (J. Chem. Phys. 1996) has been implemented and tested. Guess values of the amplitudes of the single and double excitations (the ${\\hat T}$ operator) on the top of the references are extracted from the knowledge of the coefficients of the Multi Reference Singles and Doubles Configuration Interaction (MRSDCI) matrix. The multiple parentage problem is solved by scaling these amplitudes on the interaction between the references and the Singles and Doubles. Then one proceeds to a dressing of the MRSDCI matrix under the effect of the Triples and Quadruples, the coefficients of which are estimated from the action of ${\\hat T}^2$. This dressing follows the logics of the intermediate effective Hamiltonian formalism. The dressed MRSDCI matrix is diagonalized and the process is iterated to convergence. The method is tested on a series of benchmark systems from Complete Active Spaces (CAS) involving 2 or 4 active electrons up to bond breakings. The...